1 00:00:00,000 --> 00:00:04,713 [MUSIC] 2 00:00:04,713 --> 00:00:10,033 At the very core, of, measuring how sure we are about a prediction, 3 00:00:10,033 --> 00:00:12,650 is a notion of probability. 4 00:00:12,650 --> 00:00:17,780 So let me do a very, very, very, very, very quick review, of probabilities here. 5 00:00:17,780 --> 00:00:22,877 And how they're useful throughout this module, and throughout the course. 6 00:00:22,877 --> 00:00:23,561 Okay. 7 00:00:23,561 --> 00:00:26,621 Now we'll see a very quick review of probability, 8 00:00:26,621 --> 00:00:31,960 which is just showing a few examples, and interpreting what probability means. 9 00:00:31,960 --> 00:00:36,832 So, if I say the probability that a review is positive 0.7, so 10 00:00:36,832 --> 00:00:38,957 that means like in general, 11 00:00:38,957 --> 00:00:45,030 a 0.7 is a probability associated with people writing positive reviews. 12 00:00:45,030 --> 00:00:45,962 What does that mean? 13 00:00:45,962 --> 00:00:50,976 Well if I take my data set of reviews, I have to expect on 14 00:00:50,976 --> 00:00:58,050 average that 70% of the rows here, will have positive reviews. 15 00:00:58,050 --> 00:01:01,280 And the other 30%, will be negative. 16 00:01:01,280 --> 00:01:05,430 Of course, a data set is kind of a finite sample or 17 00:01:05,430 --> 00:01:09,889 observation of the underlying space of reviews, so 18 00:01:09,889 --> 00:01:14,673 its not going to be exactly 70% but its about 70%. 19 00:01:14,673 --> 00:01:17,978 That's how we interpret probabilities. 20 00:01:17,978 --> 00:01:23,018 Now we can associate probabilities with what's called degrees of beliefs or 21 00:01:23,018 --> 00:01:24,500 degrees of sureness. 22 00:01:24,500 --> 00:01:30,620 So for example, let's look at the probability that y is plus 1. 23 00:01:30,620 --> 00:01:35,888 So this is the notation that we'll use, so probability. 24 00:01:38,627 --> 00:01:45,463 That y, which is the thing that we're trying to predict, 25 00:01:45,463 --> 00:01:51,333 output is positive, and let's interpret that. 26 00:01:51,333 --> 00:01:56,206 That output can range from 0 to 1. 27 00:01:56,206 --> 00:02:00,887 So if the output is 0 that means that I'm absolutely sure 28 00:02:00,887 --> 00:02:05,990 that every single review in the world is positive. 29 00:02:05,990 --> 00:02:07,510 That's what it means. 30 00:02:07,510 --> 00:02:12,920 So what this is saying, is that the probability that 31 00:02:12,920 --> 00:02:20,760 y = +1 is equal to 1, what does that imply? 32 00:02:20,760 --> 00:02:28,980 Well we have the probability that y is -1 that reviews are negative 33 00:02:28,980 --> 00:02:34,190 is 1 minus the probability that reviews are positive. 34 00:02:34,190 --> 00:02:37,070 Y = + 1, which means that it is 0. 35 00:02:37,070 --> 00:02:43,690 It means that there's 0 chance that there are negative reviews, which is not true. 36 00:02:43,690 --> 00:02:49,740 And so, on the other hand, if I say that the probability of y equals plus 1 is 0. 37 00:02:49,740 --> 00:02:55,850 That means I'm absolutely sure that every review in the world is not positive. 38 00:02:55,850 --> 00:03:02,650 Which in our case would say that the probability that y = + 1 is zero. 39 00:03:02,650 --> 00:03:09,133 And that implies that the probability that y is -1 is one. 40 00:03:09,133 --> 00:03:13,568 That means that every review out there is negative. 41 00:03:13,568 --> 00:03:16,380 And the truth is not that somewhere in between. 42 00:03:16,380 --> 00:03:18,977 So for example, if you say the probability is 0.5. 43 00:03:18,977 --> 00:03:22,840 That means that I'm not sure if reviews are positive or negative. 44 00:03:22,840 --> 00:03:27,590 In general, they can be 50/50, say on average half are positive, 45 00:03:27,590 --> 00:03:28,880 half are negative. 46 00:03:28,880 --> 00:03:34,080 And so this would say that the probability that y is equal plus 1 47 00:03:34,080 --> 00:03:39,749 is equal to the probability that y is equal to minus 1, which is 0.5. 48 00:03:41,680 --> 00:03:46,237 In other words, the world is fair and balanced. 49 00:03:46,237 --> 00:03:49,219 50-50. 50 00:03:49,219 --> 00:03:53,356 So let's discuss some fundamental properties of probabilities which I've 51 00:03:53,356 --> 00:03:55,209 hinted at in the examples before. 52 00:03:55,209 --> 00:04:00,355 So first of all, probabilities are always between zero and one. 53 00:04:00,355 --> 00:04:02,615 So there are no negative probabilities. 54 00:04:02,615 --> 00:04:05,895 In fact, I've taught classes where I've had 55 00:04:05,895 --> 00:04:09,600 students submit assignments that had negative probabilities in them. 56 00:04:09,600 --> 00:04:11,860 Not true, cannot happen. 57 00:04:11,860 --> 00:04:16,897 So the probability that y is + 1 58 00:04:16,897 --> 00:04:21,943 is somewhere between 0 and 1. 59 00:04:21,943 --> 00:04:26,845 Similarly, probability that Y is -1 is always greater than or 60 00:04:26,845 --> 00:04:30,580 equal to 0, and is always less than equal to 1. 61 00:04:30,580 --> 00:04:31,910 Fundamental property. 62 00:04:31,910 --> 00:04:34,053 The other fundamental property of probability is, 63 00:04:34,053 --> 00:04:35,570 is that probabilities add up to one. 64 00:04:36,810 --> 00:04:41,832 So the probability that y = +1 plus 65 00:04:41,832 --> 00:04:46,208 the probability that y = -1. 66 00:04:46,208 --> 00:04:50,866 These are two possibilities that use either positive or negative, 67 00:04:50,866 --> 00:04:54,144 nothing else can happen than that adds up to 1. 68 00:04:54,144 --> 00:04:57,110 So two things to remind ourselves. 69 00:04:57,110 --> 00:05:00,830 Now we just talked about the binary classification case, but 70 00:05:00,830 --> 00:05:02,650 you can have multiple classes. 71 00:05:02,650 --> 00:05:08,220 So for example if it's an image maybe it can be either dogs, cats, or birds. 72 00:05:08,220 --> 00:05:11,505 This is the only three things can happen in the world, let's say. 73 00:05:11,505 --> 00:05:15,671 And in that case we have the probability, 74 00:05:15,671 --> 00:05:21,146 the three probabilities are the property of y is a dog, 75 00:05:21,146 --> 00:05:26,740 so the image of the dog, the probability that y is a cat, 76 00:05:26,740 --> 00:05:30,442 and the probability that y is a bird. 77 00:05:30,442 --> 00:05:36,170 And we have that all of these are between 0 and 1. 78 00:05:36,170 --> 00:05:40,300 So they're all greater than 0 and less than 1. 79 00:05:40,300 --> 00:05:44,740 But that's not enough to capture what happens. 80 00:05:44,740 --> 00:05:47,810 The last important part is that these probabilities add up to 1. 81 00:05:47,810 --> 00:05:52,147 That is, if that's the only thing that images can be, dogs, cats, and birds. 82 00:05:52,147 --> 00:05:54,410 If the other things could be then you have to add all those in. 83 00:05:54,410 --> 00:05:58,020 But in our case there's only three possibilities, so 84 00:05:58,020 --> 00:06:03,500 we had the probability that y equals dog, plus the probability that 85 00:06:03,500 --> 00:06:09,050 y is equal to cat, plus probability that y is equal 86 00:06:09,050 --> 00:06:14,110 to bird, is equal to 1. 87 00:06:14,110 --> 00:06:15,810 And there you go. 88 00:06:15,810 --> 00:06:19,560 Now you know everything you ever need to know about probabilities. 89 00:06:19,560 --> 00:06:20,895 That's it. 90 00:06:20,895 --> 00:06:25,009 [MUSIC]