1 00:00:00,000 --> 00:00:04,106 [MUSIC] 2 00:00:04,106 --> 00:00:09,158 So for each model order that we might consider, 3 00:00:09,158 --> 00:00:12,441 for example, a linear model or 4 00:00:12,441 --> 00:00:17,367 we also talked about using a quadratic model, 5 00:00:17,367 --> 00:00:23,700 all the way up to our very crazy, 13th order polynomial. 6 00:00:25,140 --> 00:00:30,322 And of course, we could consider even higher order models. 7 00:00:30,322 --> 00:00:32,790 Well, what happens to test error? 8 00:00:32,790 --> 00:00:34,200 Sorry, I don't mean test error. 9 00:00:34,200 --> 00:00:35,559 Let's start with training error. 10 00:00:35,559 --> 00:00:37,090 A lot easier to think about. 11 00:00:37,090 --> 00:00:37,920 So training error. 12 00:00:39,510 --> 00:00:45,520 As we increase the model order, well, that models able 13 00:00:45,520 --> 00:00:50,470 to better and better fit the observations in that training dataset. 14 00:00:50,470 --> 00:00:53,260 So what we're gonna have is we're gonna have that our training error 15 00:00:54,390 --> 00:00:59,320 decreases with increasing model order. 16 00:00:59,320 --> 00:01:01,000 So remember the curves that we had. 17 00:01:01,000 --> 00:01:04,335 We had the residual sum of squares associated with that linear fit, 18 00:01:04,335 --> 00:01:09,450 quadratic fit, all the way up to the 13th order polynomial that basically hit 19 00:01:09,450 --> 00:01:10,860 each one of those observations. 20 00:01:10,860 --> 00:01:14,770 So, we saw that a residual sum of squares was going down and down and down. 21 00:01:14,770 --> 00:01:18,281 So that's true even if we hold out some observations and 22 00:01:18,281 --> 00:01:20,528 just look at our training dataset. 23 00:01:20,528 --> 00:01:23,645 So we're gonna have our training error decreasing and 24 00:01:23,645 --> 00:01:26,907 decreasing as we increase the flexibility of the model. 25 00:01:26,907 --> 00:01:32,741 But, so let's just annotate this as being our training error, 26 00:01:32,741 --> 00:01:37,936 particularly for our estimated model parameters w hat. 27 00:01:37,936 --> 00:01:42,080 So let's be clear about what we mean by w hat. 28 00:01:42,080 --> 00:01:49,533 So for every model complexity such as linear model, quadratic model, and so on. 29 00:01:49,533 --> 00:01:52,791 What we're gonna do is we're gonna optimize and 30 00:01:52,791 --> 00:01:55,980 find the parameters w hat for the linear model. 31 00:01:55,980 --> 00:01:59,640 We're searching over all possible lines minimizing the training error. 32 00:01:59,640 --> 00:02:01,290 Remember that's what we said, 33 00:02:01,290 --> 00:02:06,890 couple slides ago we said that the way we estimate our model is we're gonna 34 00:02:06,890 --> 00:02:12,220 minimize the air on that observations in our training dataset. 35 00:02:12,220 --> 00:02:14,320 So that's how we get w hat for the linear model, 36 00:02:14,320 --> 00:02:17,780 and we compute the training error associated with that w hat. 37 00:02:17,780 --> 00:02:20,620 Then we look at all possible quadratic fits. 38 00:02:20,620 --> 00:02:24,240 Minimize the training error for over all the quadratic fits, 39 00:02:24,240 --> 00:02:27,102 that's how we get w hat for the quadratic model. 40 00:02:27,102 --> 00:02:31,929 And then we plot the training error associated with the w hat for 41 00:02:31,929 --> 00:02:34,308 the quadratic model and so on. 42 00:02:34,308 --> 00:02:37,348 Well, we can also talk about test error, but 43 00:02:37,348 --> 00:02:42,228 here it's a little bit more complicated because what do we think is gonna 44 00:02:42,228 --> 00:02:45,755 happen as we increase and increase our model order? 45 00:02:45,755 --> 00:02:49,969 Well, what we saw, if you remember that 13th order polynomial fit, 46 00:02:49,969 --> 00:02:54,340 that really crazy wiggly line, we had really, really bad predictions. 47 00:02:54,340 --> 00:02:59,210 So when we think about holding out our test data, fitting a 13th order 48 00:02:59,210 --> 00:03:04,480 polynomial just on the training data, we're gonna get some wiggly, crazy fit. 49 00:03:04,480 --> 00:03:09,520 And then when we look at those test observations, those houses that we 50 00:03:09,520 --> 00:03:14,900 held out, we're probably gonna have very poor predictions of those actual values. 51 00:03:16,050 --> 00:03:20,376 So what we're gonna expect is that at some point, 52 00:03:20,376 --> 00:03:23,678 our test error is likely to increase. 53 00:03:23,678 --> 00:03:28,769 So the curves for test error tend to look something like the following 54 00:03:28,769 --> 00:03:34,228 where maybe the error is going down for some period of time but after a point 55 00:03:36,028 --> 00:03:39,743 The error starts to increase again. 56 00:03:39,743 --> 00:03:46,349 So here this curve is our test error for 57 00:03:46,349 --> 00:03:50,272 these fitted models, 58 00:03:50,272 --> 00:03:54,607 where the models were fit 59 00:03:54,607 --> 00:03:59,160 using the training data. 60 00:04:00,720 --> 00:04:04,174 So these are what curves tend to look like for training error and 61 00:04:04,174 --> 00:04:06,912 test error as a function of model complexity, and 62 00:04:06,912 --> 00:04:10,561 how we think about using these ideas to actually select the model or 63 00:04:10,561 --> 00:04:14,697 the complexity of the model that we should use for making our predictions. 64 00:04:14,697 --> 00:04:17,851 We're gonna discuss in a lot more detail in the regression and 65 00:04:17,851 --> 00:04:19,256 classification courses. 66 00:04:19,256 --> 00:04:22,425 [MUSIC]