1 00:00:00,025 --> 00:00:04,268 [MUSIC] 2 00:00:04,268 --> 00:00:07,590 Now let's talk about a slightly more advanced concept, 3 00:00:07,590 --> 00:00:10,560 the concept of conditional probabilities. 4 00:00:10,560 --> 00:00:15,850 And this is what we're going to use a lot today, so it's a good thing to review and 5 00:00:15,850 --> 00:00:18,029 a good thing to really keep in mind and understand. 6 00:00:19,070 --> 00:00:24,345 Now a sentence here might say the probability that a review that has 7 00:00:24,345 --> 00:00:29,344 three awesomes in it and one awful is positive is 0.9. 8 00:00:29,344 --> 00:00:33,730 So how, an awful, what does that mean? 9 00:00:33,730 --> 00:00:36,540 Let;s interpret it by looking at the real data. 10 00:00:36,540 --> 00:00:39,470 So, let's say that I have my data set. 11 00:00:39,470 --> 00:00:42,620 And just highlighted here awesomes and awfuls. 12 00:00:42,620 --> 00:00:48,190 And in fact, I'm highlighting synthesis that had three awesomes and one awful. 13 00:00:49,560 --> 00:00:54,940 So if I go and mark all of those, and I just look at those rows where 14 00:00:54,940 --> 00:01:00,230 the condition, this part up here is called condition. 15 00:01:00,230 --> 00:01:06,950 And that's why it's called conditional probability, three awesomes and one awful. 16 00:01:06,950 --> 00:01:10,203 So out of the rows that have three awesomes, one awful, 17 00:01:10,203 --> 00:01:14,370 I expect on average that 90% of those are going to be positive reviews. 18 00:01:14,370 --> 00:01:18,243 And that makes sense, if you have three awesomes and one awful, 19 00:01:18,243 --> 00:01:20,476 very likely to be a positive review. 20 00:01:20,476 --> 00:01:23,088 So this is how we interpret it, 21 00:01:23,088 --> 00:01:27,813 I expect 90% of the rows are going to have a +1 in them. 22 00:01:27,813 --> 00:01:34,180 See there's one here that have a -1, but in general, you have a lot of +1s. 23 00:01:34,180 --> 00:01:38,740 Now we can again interpret the conditional probabilities 24 00:01:38,740 --> 00:01:42,790 in terms of degrees of beliefs of how we feel about the reviews, but 25 00:01:42,790 --> 00:01:45,740 this is not how we feel about reviews in general. 26 00:01:45,740 --> 00:01:48,109 How we feel about reviews specifically. 27 00:01:48,109 --> 00:01:51,351 So for example, if I have a condition xi, 28 00:01:51,351 --> 00:01:55,439 an input xi that says, all the sushi was delicious. 29 00:01:56,770 --> 00:02:00,880 What's the probability that the review's positive in that condition? 30 00:02:00,880 --> 00:02:09,190 So here we have the probability of the output 31 00:02:11,710 --> 00:02:17,030 being positive and we use the word given. 32 00:02:17,030 --> 00:02:21,340 This little bar here means given or conditioned on 33 00:02:22,940 --> 00:02:28,860 the input, which in our case was all the sushi was delicious? 34 00:02:28,860 --> 00:02:31,620 It was delicious. 35 00:02:31,620 --> 00:02:35,392 And so if that conditional probability has 36 00:02:35,392 --> 00:02:40,770 value, I wrote on top of my own labels. 37 00:02:40,770 --> 00:02:42,270 But I hope it makes sense. 38 00:02:42,270 --> 00:02:44,700 That's the output label, input sentence. 39 00:02:44,700 --> 00:02:46,580 And these are all the parts of the probability. 40 00:02:47,720 --> 00:02:51,050 Now if I say that that output 41 00:02:53,050 --> 00:02:58,390 has probability one given this input, all sushi was delicious, then that means that 42 00:02:58,390 --> 00:03:04,750 I'm absolutely sure that review all the sushi was delicious is a positive review. 43 00:03:04,750 --> 00:03:06,830 I feel pretty good about that. 44 00:03:06,830 --> 00:03:14,080 So we're going to denote it by the probability of y equals plus one given xi. 45 00:03:14,080 --> 00:03:20,280 This is our shorthand notation, is equal to one. 46 00:03:22,000 --> 00:03:24,770 What does that imply by the way? 47 00:03:24,770 --> 00:03:29,549 Is it a probability that y = -1 given xi is one 48 00:03:29,549 --> 00:03:34,084 minus the probability of y = +1 given xi, 49 00:03:34,084 --> 00:03:37,410 which in our case would be zero? 50 00:03:37,410 --> 00:03:39,400 So what does that mean? 51 00:03:39,400 --> 00:03:43,778 That if the probability that this is a positive review given that 52 00:03:43,778 --> 00:03:47,425 the input sentence is all sushi was delicious is one, 53 00:03:47,425 --> 00:03:51,800 then the probability that it's negative is zero. 54 00:03:51,800 --> 00:03:55,740 Similarly, if we output to the probability y = +1 is 0 and 55 00:03:55,740 --> 00:03:59,200 I'm absolutely sure that their review, 56 00:03:59,200 --> 00:04:02,720 all the sushi was delicious is negative, and actually I'm not sure of that. 57 00:04:02,720 --> 00:04:07,005 But in that case it would mean that and 58 00:04:07,005 --> 00:04:14,607 that means the probability of y = +1 given the input xi is zero, 59 00:04:14,607 --> 00:04:19,998 and that would imply that we believe that it's 60 00:04:19,998 --> 00:04:26,523 a negative review given this input of probability one. 61 00:04:28,380 --> 00:04:29,930 It could be somewhere in between. 62 00:04:29,930 --> 00:04:34,380 So for example, let's say the estimated probability of y = +1, 63 00:04:34,380 --> 00:04:37,700 given the input all the sushi was delicious is 0.5. 64 00:04:37,700 --> 00:04:42,240 And so I'm not sure if that review is positive or negative. 65 00:04:42,240 --> 00:04:48,018 So what I'm saying is that the probability of y is +1 given xi is 66 00:04:48,018 --> 00:04:54,017 the same as the probability of y = -1 given xi, which is 0.5. 67 00:04:54,017 --> 00:04:56,320 So I would believe that it's 50/50. 68 00:04:56,320 --> 00:05:01,410 In reality, for this input I believe it will be somewhere around here. 69 00:05:01,410 --> 00:05:05,670 So the probability would be very, very high that it's a positive review. 70 00:05:07,000 --> 00:05:11,303 So let's talk about a few important properties of conditional probabilities. 71 00:05:11,303 --> 00:05:13,053 First if we have two classes, 72 00:05:13,053 --> 00:05:16,700 conditional probabilities are always between zero and one. 73 00:05:16,700 --> 00:05:21,811 So the probability that y = +1 given 74 00:05:21,811 --> 00:05:26,933 xi is somewhere between zero and one. 75 00:05:26,933 --> 00:05:35,230 Similarly the probably that y = -1 given xi is somewhere between zero and one. 76 00:05:36,720 --> 00:05:43,500 Now interestingly and importantly, the conditions probability, conditional 77 00:05:43,500 --> 00:05:47,870 probabilities, add up to one over y over the left side of the conditional. 78 00:05:47,870 --> 00:05:53,185 So the probability that y = +1 given xi, 79 00:05:53,185 --> 00:05:57,908 if I add it to the probability of the y = 80 00:05:57,908 --> 00:06:02,350 -1 given xi, that adds up to one. 81 00:06:04,150 --> 00:06:11,800 However, but, it's not true if you sum the right side. 82 00:06:11,800 --> 00:06:16,070 If you sum over the sentences, or over xi or anything like that. 83 00:06:16,070 --> 00:06:21,617 So for example, if you said, sum over all possible 84 00:06:21,617 --> 00:06:26,519 sentences, of the probability of y = +1, 85 00:06:26,519 --> 00:06:30,784 given x, that is not required to be 1. 86 00:06:31,830 --> 00:06:37,120 Or for example, let's say that you sum over all the data points. 87 00:06:37,120 --> 00:06:45,440 And you might have, end of them of the probability of y = +1 given xi. 88 00:06:45,440 --> 00:06:46,760 Again, that is not one. 89 00:06:47,810 --> 00:06:48,920 Not required to be one. 90 00:06:50,530 --> 00:06:55,240 So the other to one on the, left side over wise. 91 00:06:55,240 --> 00:06:59,060 Now, we talked about the binary case, let's talk about multiple classes. 92 00:06:59,060 --> 00:07:03,331 So in this case, same example as before so 93 00:07:03,331 --> 00:07:08,710 y can be dog but now we conditioned it on the image xi. 94 00:07:08,710 --> 00:07:16,299 Or y can be cat with a particular image xi as input, 95 00:07:16,299 --> 00:07:22,804 or y can be bird with a particular input xi, 96 00:07:22,804 --> 00:07:29,883 and we have that these are between zero or one. 97 00:07:33,067 --> 00:07:35,510 And that's input, output here. 98 00:07:36,510 --> 00:07:40,910 Now, similarly with multiple classes, we have the probabilities add up to one, 99 00:07:40,910 --> 00:07:44,050 if you sum over positive values of y. 100 00:07:44,050 --> 00:07:49,205 So, if you have the probability that y is equal to dog for 101 00:07:49,205 --> 00:07:54,363 this particular image xi, and you add that probability 102 00:07:54,363 --> 00:07:59,409 that y is equal to cat for this particular image xi, and 103 00:07:59,409 --> 00:08:05,661 you add the probable to the y is equal to bird for this particular xi, 104 00:08:05,661 --> 00:08:10,195 and you add all those together, and you get one. 105 00:08:10,195 --> 00:08:11,356 Now here we go. 106 00:08:11,356 --> 00:08:17,115 Now we've covered, very quickly, a summary of the key properties of probabilities and 107 00:08:17,115 --> 00:08:21,646 conditional probabilities and we're going to use them in a high level 108 00:08:21,646 --> 00:08:26,515 throughout this particular module and throughout this specialization. 109 00:08:26,515 --> 00:08:27,015 [MUSIC]