1 00:00:00,008 --> 00:00:04,669 [MUSIC] 2 00:00:04,669 --> 00:00:09,130 And I wanna talk about this notion of overfitting because this is something that 3 00:00:09,130 --> 00:00:11,540 we've talked about before in the course. 4 00:00:11,540 --> 00:00:12,790 I wanna formalize it, and 5 00:00:12,790 --> 00:00:15,800 we're gonna discuss it a lot more in the remainder of this course. 6 00:00:16,980 --> 00:00:22,450 Okay, so the notion of overfitting is if you have some model, 7 00:00:22,450 --> 00:00:27,245 let's say a model here with parameters W hat, 8 00:00:27,245 --> 00:00:32,360 so this model has some complexity and some associated estimated parameters, W hat. 9 00:00:33,930 --> 00:00:39,460 Well, this model is overfit, 10 00:00:39,460 --> 00:00:44,353 if there exists a model with 11 00:00:44,353 --> 00:00:48,821 estimated parameters, 12 00:00:48,821 --> 00:00:54,150 I'll just call them w prime. 13 00:00:54,150 --> 00:00:57,590 So let's just say some other point here. 14 00:00:57,590 --> 00:01:01,999 Let's say these have parameters w 15 00:01:01,999 --> 00:01:06,870 prime such that two conditions hold. 16 00:01:08,410 --> 00:01:14,790 The training error, so one is training, can't spell right now. 17 00:01:17,510 --> 00:01:24,070 Training error of w hat is less than 18 00:01:24,070 --> 00:01:29,830 the training error of w prime. 19 00:01:29,830 --> 00:01:34,965 But on the other hand, the true 20 00:01:34,965 --> 00:01:40,050 error of w hat is 21 00:01:40,050 --> 00:01:45,110 greater than the true error of w prime. 22 00:01:46,260 --> 00:01:48,350 Okay, so this might not seem that intuitive, 23 00:01:48,350 --> 00:01:50,340 but let me go through it in terms of this picture here, 24 00:01:50,340 --> 00:01:54,040 which is exactly what these points, one and two, are saying. 25 00:01:54,040 --> 00:01:59,530 Which is there are a wide range of models that have 26 00:02:01,290 --> 00:02:06,690 true error larger than for example, this w prime here. 27 00:02:06,690 --> 00:02:10,800 But the ones that are overfit are the ones that have smaller training error. 28 00:02:10,800 --> 00:02:15,190 These are the ones that are really, really highly fit to the training data set but 29 00:02:15,190 --> 00:02:16,730 don't generalize well. 30 00:02:16,730 --> 00:02:19,995 Whereas the other points on the other half of this space are the ones that 31 00:02:19,995 --> 00:02:25,110 are not really well fit to the training data and also don't generalize well. 32 00:02:25,110 --> 00:02:29,011 Okay, so this is formally our notion of what an overfitted model is. 33 00:02:29,011 --> 00:02:33,719 [MUSIC]