1
00:00:00,636 --> 00:00:04,393
[MUSIC]

2
00:00:04,393 --> 00:00:07,217
In the last module,
we talked about the potential for

3
00:00:07,217 --> 00:00:10,710
high complexity models to
become overfit to the data.

4
00:00:10,710 --> 00:00:13,854
And we also discussed this idea
of a bias-varience tradeoff.

5
00:00:13,854 --> 00:00:18,944
Where high complexity models could
have very low bias, but high variance.

6
00:00:18,944 --> 00:00:24,150
Whereas low complexity models
have high bias, but low variance.

7
00:00:24,150 --> 00:00:26,460
And we said that we wanted to
trade off between bias and

8
00:00:26,460 --> 00:00:31,280
variance to get to that sweet spot of
having good predictive performance.

9
00:00:31,280 --> 00:00:35,620
And in this module, what we're gonna do is
talk about a way to automatically balance

10
00:00:35,620 --> 00:00:38,680
between bias and variance using
something called ridge regression.

11
00:00:39,730 --> 00:00:44,040
So let's recall this issue of overfitting
in the context of polynomial regression.

12
00:00:44,040 --> 00:00:47,320
And remember,
this is our polynomial regression model.

13
00:00:47,320 --> 00:00:51,400
And if we assume we have some low order of
polynomial that we're fitting to our data,

14
00:00:51,400 --> 00:00:53,380
we might get a fit that
looks like the following.

15
00:00:53,380 --> 00:00:55,790
This is just a quadratic fit to the data.

16
00:00:55,790 --> 00:00:58,310
But once we get to a much
higher order polynomial,

17
00:00:58,310 --> 00:01:02,500
we can get these really wild fits
to our training observations.

18
00:01:02,500 --> 00:01:06,440
Again, this is an instance
of a high variance model.

19
00:01:06,440 --> 00:01:10,970
But we refer to this model or
this fit as being overfit.

20
00:01:12,260 --> 00:01:16,640
Because it is very, very well tuned
to our training observations, but

21
00:01:16,640 --> 00:01:20,470
it doesn't generalize well to
other observations we might see.

22
00:01:20,470 --> 00:01:24,030
So, previously we had discussed a very
formal notion of what it means for

23
00:01:24,030 --> 00:01:25,840
a model to be overfit.

24
00:01:25,840 --> 00:01:30,850
In terms of the training error being less
than the training error of another model,

25
00:01:30,850 --> 00:01:35,900
whose true error is actually smaller
than the true error of the model with

26
00:01:35,900 --> 00:01:37,140
smaller training error.

27
00:01:37,140 --> 00:01:39,790
Okay, hopefully you remember
that from the last module.

28
00:01:39,790 --> 00:01:44,170
But a question we have now is,
is there some type of quantitative measure

29
00:01:44,170 --> 00:01:47,565
that's indicative of
when a model is overfit?

30
00:01:47,565 --> 00:01:51,117
And to see this,
let's look at the following demo,

31
00:01:51,117 --> 00:01:55,721
where what we're going to show is
that when models become overfit,

32
00:01:55,721 --> 00:02:00,327
the estimated coefficients of those
models tend to become really,

33
00:02:00,327 --> 00:02:02,949
really, really large in magnitude.

34
00:02:02,949 --> 00:02:07,429
[MUSIC]