[MUSIC] Now let's talk about a slightly more advanced concept, the concept of conditional probabilities. And this is what we're going to use a lot today, so it's a good thing to review and a good thing to really keep in mind and understand. Now a sentence here might say the probability that a review that has three awesomes in it and one awful is positive is 0.9. So how, an awful, what does that mean? Let;s interpret it by looking at the real data. So, let's say that I have my data set. And just highlighted here awesomes and awfuls. And in fact, I'm highlighting synthesis that had three awesomes and one awful. So if I go and mark all of those, and I just look at those rows where the condition, this part up here is called condition. And that's why it's called conditional probability, three awesomes and one awful. So out of the rows that have three awesomes, one awful, I expect on average that 90% of those are going to be positive reviews. And that makes sense, if you have three awesomes and one awful, very likely to be a positive review. So this is how we interpret it, I expect 90% of the rows are going to have a +1 in them. See there's one here that have a -1, but in general, you have a lot of +1s. Now we can again interpret the conditional probabilities in terms of degrees of beliefs of how we feel about the reviews, but this is not how we feel about reviews in general. How we feel about reviews specifically. So for example, if I have a condition xi, an input xi that says, all the sushi was delicious. What's the probability that the review's positive in that condition? So here we have the probability of the output being positive and we use the word given. This little bar here means given or conditioned on the input, which in our case was all the sushi was delicious? It was delicious. And so if that conditional probability has value, I wrote on top of my own labels. But I hope it makes sense. That's the output label, input sentence. And these are all the parts of the probability. Now if I say that that output has probability one given this input, all sushi was delicious, then that means that I'm absolutely sure that review all the sushi was delicious is a positive review. I feel pretty good about that. So we're going to denote it by the probability of y equals plus one given xi. This is our shorthand notation, is equal to one. What does that imply by the way? Is it a probability that y = -1 given xi is one minus the probability of y = +1 given xi, which in our case would be zero? So what does that mean? That if the probability that this is a positive review given that the input sentence is all sushi was delicious is one, then the probability that it's negative is zero. Similarly, if we output to the probability y = +1 is 0 and I'm absolutely sure that their review, all the sushi was delicious is negative, and actually I'm not sure of that. But in that case it would mean that and that means the probability of y = +1 given the input xi is zero, and that would imply that we believe that it's a negative review given this input of probability one. It could be somewhere in between. So for example, let's say the estimated probability of y = +1, given the input all the sushi was delicious is 0.5. And so I'm not sure if that review is positive or negative. So what I'm saying is that the probability of y is +1 given xi is the same as the probability of y = -1 given xi, which is 0.5. So I would believe that it's 50/50. In reality, for this input I believe it will be somewhere around here. So the probability would be very, very high that it's a positive review. So let's talk about a few important properties of conditional probabilities. First if we have two classes, conditional probabilities are always between zero and one. So the probability that y = +1 given xi is somewhere between zero and one. Similarly the probably that y = -1 given xi is somewhere between zero and one. Now interestingly and importantly, the conditions probability, conditional probabilities, add up to one over y over the left side of the conditional. So the probability that y = +1 given xi, if I add it to the probability of the y = -1 given xi, that adds up to one. However, but, it's not true if you sum the right side. If you sum over the sentences, or over xi or anything like that. So for example, if you said, sum over all possible sentences, of the probability of y = +1, given x, that is not required to be 1. Or for example, let's say that you sum over all the data points. And you might have, end of them of the probability of y = +1 given xi. Again, that is not one. Not required to be one. So the other to one on the, left side over wise. Now, we talked about the binary case, let's talk about multiple classes. So in this case, same example as before so y can be dog but now we conditioned it on the image xi. Or y can be cat with a particular image xi as input, or y can be bird with a particular input xi, and we have that these are between zero or one. And that's input, output here. Now, similarly with multiple classes, we have the probabilities add up to one, if you sum over positive values of y. So, if you have the probability that y is equal to dog for this particular image xi, and you add that probability that y is equal to cat for this particular image xi, and you add the probable to the y is equal to bird for this particular xi, and you add all those together, and you get one. Now here we go. Now we've covered, very quickly, a summary of the key properties of probabilities and conditional probabilities and we're going to use them in a high level throughout this particular module and throughout this specialization. [MUSIC]