1
00:00:00,000 --> 00:00:04,106
[MUSIC]

2
00:00:04,106 --> 00:00:09,158
So for
each model order that we might consider,

3
00:00:09,158 --> 00:00:12,441
for example, a linear model or

4
00:00:12,441 --> 00:00:17,367
we also talked about
using a quadratic model,

5
00:00:17,367 --> 00:00:23,700
all the way up to our very crazy,
13th order polynomial.

6
00:00:25,140 --> 00:00:30,322
And of course, we could consider
even higher order models.

7
00:00:30,322 --> 00:00:32,790
Well, what happens to test error?

8
00:00:32,790 --> 00:00:34,200
Sorry, I don't mean test error.

9
00:00:34,200 --> 00:00:35,559
Let's start with training error.

10
00:00:35,559 --> 00:00:37,090
A lot easier to think about.

11
00:00:37,090 --> 00:00:37,920
So training error.

12
00:00:39,510 --> 00:00:45,520
As we increase the model order,
well, that models able

13
00:00:45,520 --> 00:00:50,470
to better and better fit the observations
in that training dataset.

14
00:00:50,470 --> 00:00:53,260
So what we're gonna have is we're
gonna have that our training error

15
00:00:54,390 --> 00:00:59,320
decreases with increasing model order.

16
00:00:59,320 --> 00:01:01,000
So remember the curves that we had.

17
00:01:01,000 --> 00:01:04,335
We had the residual sum of squares
associated with that linear fit,

18
00:01:04,335 --> 00:01:09,450
quadratic fit, all the way up to the 13th
order polynomial that basically hit

19
00:01:09,450 --> 00:01:10,860
each one of those observations.

20
00:01:10,860 --> 00:01:14,770
So, we saw that a residual sum of squares
was going down and down and down.

21
00:01:14,770 --> 00:01:18,281
So that's true even if we hold
out some observations and

22
00:01:18,281 --> 00:01:20,528
just look at our training dataset.

23
00:01:20,528 --> 00:01:23,645
So we're gonna have our
training error decreasing and

24
00:01:23,645 --> 00:01:26,907
decreasing as we increase
the flexibility of the model.

25
00:01:26,907 --> 00:01:32,741
But, so let's just annotate this
as being our training error,

26
00:01:32,741 --> 00:01:37,936
particularly for
our estimated model parameters w hat.

27
00:01:37,936 --> 00:01:42,080
So let's be clear about
what we mean by w hat.

28
00:01:42,080 --> 00:01:49,533
So for every model complexity such as
linear model, quadratic model, and so on.

29
00:01:49,533 --> 00:01:52,791
What we're gonna do is
we're gonna optimize and

30
00:01:52,791 --> 00:01:55,980
find the parameters w hat for
the linear model.

31
00:01:55,980 --> 00:01:59,640
We're searching over all possible
lines minimizing the training error.

32
00:01:59,640 --> 00:02:01,290
Remember that's what we said,

33
00:02:01,290 --> 00:02:06,890
couple slides ago we said that the way
we estimate our model is we're gonna

34
00:02:06,890 --> 00:02:12,220
minimize the air on that observations
in our training dataset.

35
00:02:12,220 --> 00:02:14,320
So that's how we get w hat for
the linear model,

36
00:02:14,320 --> 00:02:17,780
and we compute the training error
associated with that w hat.

37
00:02:17,780 --> 00:02:20,620
Then we look at all
possible quadratic fits.

38
00:02:20,620 --> 00:02:24,240
Minimize the training error for
over all the quadratic fits,

39
00:02:24,240 --> 00:02:27,102
that's how we get w hat for
the quadratic model.

40
00:02:27,102 --> 00:02:31,929
And then we plot the training error
associated with the w hat for

41
00:02:31,929 --> 00:02:34,308
the quadratic model and so on.

42
00:02:34,308 --> 00:02:37,348
Well, we can also talk about test error,
but

43
00:02:37,348 --> 00:02:42,228
here it's a little bit more complicated
because what do we think is gonna

44
00:02:42,228 --> 00:02:45,755
happen as we increase and
increase our model order?

45
00:02:45,755 --> 00:02:49,969
Well, what we saw, if you remember
that 13th order polynomial fit,

46
00:02:49,969 --> 00:02:54,340
that really crazy wiggly line,
we had really, really bad predictions.

47
00:02:54,340 --> 00:02:59,210
So when we think about holding out
our test data, fitting a 13th order

48
00:02:59,210 --> 00:03:04,480
polynomial just on the training data,
we're gonna get some wiggly, crazy fit.

49
00:03:04,480 --> 00:03:09,520
And then when we look at those test
observations, those houses that we

50
00:03:09,520 --> 00:03:14,900
held out, we're probably gonna have very
poor predictions of those actual values.

51
00:03:16,050 --> 00:03:20,376
So what we're gonna expect
is that at some point,

52
00:03:20,376 --> 00:03:23,678
our test error is likely to increase.

53
00:03:23,678 --> 00:03:28,769
So the curves for test error tend to
look something like the following

54
00:03:28,769 --> 00:03:34,228
where maybe the error is going down for
some period of time but after a point

55
00:03:36,028 --> 00:03:39,743
The error starts to increase again.

56
00:03:39,743 --> 00:03:46,349
So here this curve is our test error for

57
00:03:46,349 --> 00:03:50,272
these fitted models,

58
00:03:50,272 --> 00:03:54,607
where the models were fit

59
00:03:54,607 --> 00:03:59,160
using the training data.

60
00:04:00,720 --> 00:04:04,174
So these are what curves tend to
look like for training error and

61
00:04:04,174 --> 00:04:06,912
test error as a function
of model complexity, and

62
00:04:06,912 --> 00:04:10,561
how we think about using these ideas
to actually select the model or

63
00:04:10,561 --> 00:04:14,697
the complexity of the model that we
should use for making our predictions.

64
00:04:14,697 --> 00:04:17,851
We're gonna discuss in a lot more
detail in the regression and

65
00:04:17,851 --> 00:04:19,256
classification courses.

66
00:04:19,256 --> 00:04:22,425
[MUSIC]