1 00:00:00,636 --> 00:00:04,393 [MUSIC] 2 00:00:04,393 --> 00:00:07,217 In the last module, we talked about the potential for 3 00:00:07,217 --> 00:00:10,710 high complexity models to become overfit to the data. 4 00:00:10,710 --> 00:00:13,854 And we also discussed this idea of a bias-varience tradeoff. 5 00:00:13,854 --> 00:00:18,944 Where high complexity models could have very low bias, but high variance. 6 00:00:18,944 --> 00:00:24,150 Whereas low complexity models have high bias, but low variance. 7 00:00:24,150 --> 00:00:26,460 And we said that we wanted to trade off between bias and 8 00:00:26,460 --> 00:00:31,280 variance to get to that sweet spot of having good predictive performance. 9 00:00:31,280 --> 00:00:35,620 And in this module, what we're gonna do is talk about a way to automatically balance 10 00:00:35,620 --> 00:00:38,680 between bias and variance using something called ridge regression. 11 00:00:39,730 --> 00:00:44,040 So let's recall this issue of overfitting in the context of polynomial regression. 12 00:00:44,040 --> 00:00:47,320 And remember, this is our polynomial regression model. 13 00:00:47,320 --> 00:00:51,400 And if we assume we have some low order of polynomial that we're fitting to our data, 14 00:00:51,400 --> 00:00:53,380 we might get a fit that looks like the following. 15 00:00:53,380 --> 00:00:55,790 This is just a quadratic fit to the data. 16 00:00:55,790 --> 00:00:58,310 But once we get to a much higher order polynomial, 17 00:00:58,310 --> 00:01:02,500 we can get these really wild fits to our training observations. 18 00:01:02,500 --> 00:01:06,440 Again, this is an instance of a high variance model. 19 00:01:06,440 --> 00:01:10,970 But we refer to this model or this fit as being overfit. 20 00:01:12,260 --> 00:01:16,640 Because it is very, very well tuned to our training observations, but 21 00:01:16,640 --> 00:01:20,470 it doesn't generalize well to other observations we might see. 22 00:01:20,470 --> 00:01:24,030 So, previously we had discussed a very formal notion of what it means for 23 00:01:24,030 --> 00:01:25,840 a model to be overfit. 24 00:01:25,840 --> 00:01:30,850 In terms of the training error being less than the training error of another model, 25 00:01:30,850 --> 00:01:35,900 whose true error is actually smaller than the true error of the model with 26 00:01:35,900 --> 00:01:37,140 smaller training error. 27 00:01:37,140 --> 00:01:39,790 Okay, hopefully you remember that from the last module. 28 00:01:39,790 --> 00:01:44,170 But a question we have now is, is there some type of quantitative measure 29 00:01:44,170 --> 00:01:47,565 that's indicative of when a model is overfit? 30 00:01:47,565 --> 00:01:51,117 And to see this, let's look at the following demo, 31 00:01:51,117 --> 00:01:55,721 where what we're going to show is that when models become overfit, 32 00:01:55,721 --> 00:02:00,327 the estimated coefficients of those models tend to become really, 33 00:02:00,327 --> 00:02:02,949 really, really large in magnitude. 34 00:02:02,949 --> 00:02:07,429 [MUSIC]