1 00:00:00,000 --> 00:00:04,375 [MUSIC] 2 00:00:04,375 --> 00:00:08,431 This is the same dataset ,and the same learn model from our fewer slides ago. 3 00:00:08,431 --> 00:00:14,387 And what I'm plotting here on the right is not just a decision boundary but 4 00:00:14,387 --> 00:00:18,740 is the probability that y hat is equal to plus one. 5 00:00:18,740 --> 00:00:20,339 So it's a probability plot. 6 00:00:20,339 --> 00:00:27,072 For other points over here, the probability is approximately zero. 7 00:00:27,072 --> 00:00:31,706 So approximately zero chances that the points 8 00:00:31,706 --> 00:00:36,500 there with minus five and four are positive. 9 00:00:36,500 --> 00:00:42,420 While the points over here have a probability approximately one. 10 00:00:44,170 --> 00:00:46,830 So the probability being y equals plus one 11 00:00:46,830 --> 00:00:48,500 is approximately one on the bottom right corner. 12 00:00:48,500 --> 00:00:53,476 So all that make sense and what makes most sense to me is that 13 00:00:53,476 --> 00:00:58,555 the region in between, right here that right region this is 14 00:00:58,555 --> 00:01:03,839 the region where the probability is approximately .5 and so 15 00:01:03,839 --> 00:01:10,138 what kind of uncertain as who whether were positive or negative review and 16 00:01:10,138 --> 00:01:15,342 it's a region that's a pretty wide region of uncertainty. 17 00:01:18,109 --> 00:01:22,470 So although the linear classifier the straight line here polynomial's is agree 18 00:01:22,470 --> 00:01:24,860 one was not a great fit to the data. 19 00:01:24,860 --> 00:01:28,330 The uncertainty measures makes quite a lot of sense. 20 00:01:28,330 --> 00:01:31,000 The points over here that were getting misclassified 21 00:01:31,000 --> 00:01:35,280 I'm actually uncertain about whether their positive or negative and so 22 00:01:35,280 --> 00:01:37,860 I can feel like this classifier is doing something very reasonable. 23 00:01:39,430 --> 00:01:42,370 Not let's look at degree two form on your fit. 24 00:01:42,370 --> 00:01:46,710 So what happens is degree two polynomial features or 25 00:01:46,710 --> 00:01:51,190 quadratic features and learn the same classifier as we learned a few slides ago. 26 00:01:51,190 --> 00:01:57,600 But again, plot the probability that y hat equals plus one. 27 00:01:58,800 --> 00:02:00,860 As we saw from a few slides ago, 28 00:02:00,860 --> 00:02:04,290 we believe that this quadratic fit was actually a better fit to the data. 29 00:02:06,890 --> 00:02:10,682 So if you look at it, the uncertainty region is narrower. 30 00:02:14,257 --> 00:02:18,230 And to me, this makes a lot of sense, I have a better fit to the data. 31 00:02:18,230 --> 00:02:20,640 There's less points that I'm ascertain about. 32 00:02:20,640 --> 00:02:23,790 And in fact, the places where I have uncertainty are exactly the ones in 33 00:02:23,790 --> 00:02:26,280 the boundary region where I should have some uncertainty. 34 00:02:26,280 --> 00:02:28,920 Ones where I'm not sure if they're plus one or minus one, 35 00:02:28,920 --> 00:02:32,530 they're close to the boundary, it makes a lot of sense. 36 00:02:32,530 --> 00:02:36,500 So this a really great fit not just in terms of that decision boundary but 37 00:02:36,500 --> 00:02:37,790 also in terms of the probabilities. 38 00:02:37,790 --> 00:02:41,620 The places where the probabilities closer to 0.5 are really the ones 39 00:02:41,620 --> 00:02:44,210 where I'm really unsure about what's going on. 40 00:02:44,210 --> 00:02:47,200 Then it's mostly decreases or it mostly increases 41 00:02:47,200 --> 00:02:51,020 depending of if I go to the left side or the right side of the parabola. 42 00:02:52,300 --> 00:02:55,730 Now let's see what happens when I use higher order features, for 43 00:02:55,730 --> 00:03:01,050 example polynomial degrees six feature or polynomial degrees 20 feature. 44 00:03:01,050 --> 00:03:04,910 We saw that those decision boundaries became really wiggly and crazy, but 45 00:03:04,910 --> 00:03:07,250 now if you look at the uncertainty regions, 46 00:03:07,250 --> 00:03:09,590 you'll see they become really really narrower. 47 00:03:09,590 --> 00:03:12,610 So you've gotta squint to see these, because they're really really thin. 48 00:03:12,610 --> 00:03:17,364 But you can see them over here kind of in the little white band, so according to 49 00:03:17,364 --> 00:03:22,047 this model, not only is the [INAUDIBLE] boundary this really crazy line, but 50 00:03:22,047 --> 00:03:25,559 the only places where I'm unsure about my prediction and 51 00:03:25,559 --> 00:03:28,660 this little places thin little bands in between. 52 00:03:28,660 --> 00:03:32,440 So there's tiny uncertainty regions and 53 00:03:32,440 --> 00:03:36,230 I'm overfitting and overconfident about it. 54 00:03:36,230 --> 00:03:40,710 The way I think about it and I say is we're sure we're right. 55 00:03:42,680 --> 00:03:43,960 And we're surely wrong about that. 56 00:03:43,960 --> 00:03:48,481 So we're absolutely wrong, but we're sure we're right, and that's really bad. 57 00:03:48,481 --> 00:03:53,057 And so uncertainty is something that's very important in classifiers and 58 00:03:53,057 --> 00:03:57,489 by looking downstairs we have another interpretation of overfitting, 59 00:03:57,489 --> 00:04:02,138 another way that overfitting gets expressed in classification by creating 60 00:04:02,138 --> 00:04:06,296 this really narrow uncertainty bands, and so we want to avoid that. 61 00:04:06,296 --> 00:04:08,085 We'll do everything we can to avoid it. 62 00:04:08,085 --> 00:04:13,309 [MUSIC]