[MUSIC] Now let's talk about a slightly
more advanced concept, the concept of conditional probabilities. And this is what we're going to use a lot
today, so it's a good thing to review and a good thing to really keep in mind and
understand. Now a sentence here might say
the probability that a review that has three awesomes in it and
one awful is positive is 0.9. So how, an awful, what does that mean? Let;s interpret it by
looking at the real data. So, let's say that I have my data set. And just highlighted here awesomes and
awfuls. And in fact, I'm highlighting synthesis
that had three awesomes and one awful. So if I go and mark all of those,
and I just look at those rows where the condition,
this part up here is called condition. And that's why it's called conditional
probability, three awesomes and one awful. So out of the rows that have
three awesomes, one awful, I expect on average that 90% of those
are going to be positive reviews. And that makes sense,
if you have three awesomes and one awful, very likely to be a positive review. So this is how we interpret it, I expect 90% of the rows
are going to have a +1 in them. See there's one here that have a -1,
but in general, you have a lot of +1s. Now we can again interpret
the conditional probabilities in terms of degrees of beliefs of
how we feel about the reviews, but this is not how we feel
about reviews in general. How we feel about reviews specifically. So for example, if I have a condition xi, an input xi that says,
all the sushi was delicious. What's the probability that the review's
positive in that condition? So here we have
the probability of the output being positive and we use the word given. This little bar here means given or
conditioned on the input, which in our case was
all the sushi was delicious? It was delicious. And so if that conditional probability has value, I wrote on top of my own labels. But I hope it makes sense. That's the output label, input sentence. And these are all the parts
of the probability. Now if I say that that output has probability one given this input, all
sushi was delicious, then that means that I'm absolutely sure that review all the
sushi was delicious is a positive review. I feel pretty good about that. So we're going to denote it by the
probability of y equals plus one given xi. This is our shorthand notation,
is equal to one. What does that imply by the way? Is it a probability that
y = -1 given xi is one minus the probability of y = +1 given xi, which in our case would be zero? So what does that mean? That if the probability that this
is a positive review given that the input sentence is all
sushi was delicious is one, then the probability that
it's negative is zero. Similarly, if we output to
the probability y = +1 is 0 and I'm absolutely sure that their review, all the sushi was delicious is negative,
and actually I'm not sure of that. But in that case it would mean that and that means the probability of y
= +1 given the input xi is zero, and that would imply
that we believe that it's a negative review given this
input of probability one. It could be somewhere in between. So for example, let's say
the estimated probability of y = +1, given the input all the sushi
was delicious is 0.5. And so I'm not sure if that
review is positive or negative. So what I'm saying is that
the probability of y is +1 given xi is the same as the probability of
y = -1 given xi, which is 0.5. So I would believe that it's 50/50. In reality, for this input I believe
it will be somewhere around here. So the probability would be very,
very high that it's a positive review. So let's talk about a few important
properties of conditional probabilities. First if we have two classes, conditional probabilities
are always between zero and one. So the probability that y = +1 given xi is somewhere between zero and one. Similarly the probably that y = -1 given
xi is somewhere between zero and one. Now interestingly and importantly,
the conditions probability, conditional probabilities, add up to one over y
over the left side of the conditional. So the probability that y = +1 given xi, if I add it to the probability of the y = -1 given xi, that adds up to one. However, but,
it's not true if you sum the right side. If you sum over the sentences, or
over xi or anything like that. So for example, if you said,
sum over all possible sentences, of the probability of y = +1, given x, that is not required to be 1. Or for example, let's say that
you sum over all the data points. And you might have, end of them of
the probability of y = +1 given xi. Again, that is not one. Not required to be one. So the other to one on the,
left side over wise. Now, we talked about the binary case,
let's talk about multiple classes. So in this case, same example as before so y can be dog but
now we conditioned it on the image xi. Or y can be cat with
a particular image xi as input, or y can be bird with
a particular input xi, and we have that these are between zero or
one. And that's input, output here. Now, similarly with multiple classes,
we have the probabilities add up to one, if you sum over positive values of y. So, if you have the probability
that y is equal to dog for this particular image xi,
and you add that probability that y is equal to cat for
this particular image xi, and you add the probable to the y is
equal to bird for this particular xi, and you add all those together,
and you get one. Now here we go. Now we've covered, very quickly, a summary
of the key properties of probabilities and conditional probabilities and
we're going to use them in a high level throughout this particular module and
throughout this specialization. [MUSIC]