1 00:00:00,000 --> 00:00:04,491 [MUSIC] 2 00:00:04,491 --> 00:00:07,317 Now that we've talked about Classification Error, 3 00:00:07,317 --> 00:00:11,230 we can explore the notion of Overfitting in classification. 4 00:00:11,230 --> 00:00:14,690 Before we start, let's review what we talked about when we discussed overfitting 5 00:00:14,690 --> 00:00:16,470 the regression course. 6 00:00:16,470 --> 00:00:18,230 Going back to the regression course, 7 00:00:18,230 --> 00:00:20,860 we had this running example of where we're trying to predict 8 00:00:20,860 --> 00:00:24,320 price of the house in the y axis given some features of the house. 9 00:00:24,320 --> 00:00:28,435 In this case, the x axis is the square feet or the size of the house. 10 00:00:28,435 --> 00:00:31,220 And we see on the left here a really nice, smooth 11 00:00:31,220 --> 00:00:34,850 curve that helps us predict what's the price for house given it's square feet. 12 00:00:35,870 --> 00:00:39,450 However if instead of using a second degree polynomial like we did on the left, 13 00:00:39,450 --> 00:00:44,300 we use a higher or lower polynomial we might get a really crazy wiggly line that 14 00:00:44,300 --> 00:00:49,300 fits the training data extremely well but doesn't generalize beyond the training 15 00:00:49,300 --> 00:00:54,570 data to data in the test set or in the truth in the real world. 16 00:00:54,570 --> 00:00:59,730 so in this case we say that the model F here has over-fit the training data. 17 00:01:00,870 --> 00:01:03,510 Now in classification, we going to have the same kinds of problems and 18 00:01:03,510 --> 00:01:05,180 we're going to have to try to address it. 19 00:01:05,180 --> 00:01:09,550 What happens when you learn the model, which is too good on the training data 20 00:01:09,550 --> 00:01:13,250 is training error that is too low but doesn't do well on the true error. 21 00:01:15,200 --> 00:01:17,830 Now let's preview some of these overfitting plots which we discussed 22 00:01:17,830 --> 00:01:19,680 quite a lot in the regression course. 23 00:01:19,680 --> 00:01:22,700 So, here in the x axis I'm showing the model complexity. 24 00:01:22,700 --> 00:01:25,230 The model in the left is a very simple constant model, 25 00:01:25,230 --> 00:01:29,670 while the model in the right is a crazily polynomial very high degree. 26 00:01:29,670 --> 00:01:31,850 So, if you look at the training error, 27 00:01:31,850 --> 00:01:35,690 when you use a constant model it doesn't do really well in the training data. 28 00:01:35,690 --> 00:01:37,715 But it feels this crazy high degree polynomial, 29 00:01:37,715 --> 00:01:39,990 you might get zero training error. 30 00:01:39,990 --> 00:01:42,180 And somewhere in between, we get training error. 31 00:01:42,180 --> 00:01:47,630 There is decreasing like that, and eventually goes to zero. 32 00:01:47,630 --> 00:01:49,730 So this is my training error. 33 00:01:49,730 --> 00:01:53,180 And the problem with training error goes to zero 34 00:01:53,180 --> 00:01:58,070 is if you look the underlying model, it doesn't do well on other houses. 35 00:01:58,070 --> 00:02:04,140 So the true error is very high when you don't have a lot of data. 36 00:02:04,140 --> 00:02:06,290 Sorry when your models very simple. 37 00:02:06,290 --> 00:02:09,150 But it's also very high when your models very complex. 38 00:02:09,150 --> 00:02:13,360 And the true error function has this kind of characteristic process. 39 00:02:13,360 --> 00:02:14,610 It goes down and then goes up. 40 00:02:15,620 --> 00:02:18,160 So this is the true error. 41 00:02:19,440 --> 00:02:22,650 So overfitting is going to happen 42 00:02:22,650 --> 00:02:25,960 when we pick something that's too good in the training data. 43 00:02:25,960 --> 00:02:31,840 So it's somewhere over here, but it doesn't do well with aspect to true error. 44 00:02:31,840 --> 00:02:37,140 So in other words, you get some w hat over here that has very low training error, 45 00:02:37,140 --> 00:02:41,780 but high true error while there were exists some other certain parameters. 46 00:02:41,780 --> 00:02:46,830 We can call them w*, which might not have done as well in the training data, but 47 00:02:46,830 --> 00:02:51,320 man, they did much better as performance in the true error. 48 00:02:52,570 --> 00:02:57,370 And so we want to try to avoid taking these highly-performant 49 00:02:57,370 --> 00:03:01,550 models in the training data that don't do well in the real world. 50 00:03:01,550 --> 00:03:07,168 So we want to shift From the right here to the left to avoid overfitting. 51 00:03:10,047 --> 00:03:12,040 Now this was for the regression setting. 52 00:03:12,040 --> 00:03:14,732 Next we'll talk about it in the classification setting. 53 00:03:14,732 --> 00:03:18,809 [MUSIC]