1
00:00:00,000 --> 00:00:04,491
[MUSIC]

2
00:00:04,491 --> 00:00:07,317
Now that we've talked about
Classification Error,

3
00:00:07,317 --> 00:00:11,230
we can explore the notion of
Overfitting in classification.

4
00:00:11,230 --> 00:00:14,690
Before we start, let's review what we
talked about when we discussed overfitting

5
00:00:14,690 --> 00:00:16,470
the regression course.

6
00:00:16,470 --> 00:00:18,230
Going back to the regression course,

7
00:00:18,230 --> 00:00:20,860
we had this running example of
where we're trying to predict

8
00:00:20,860 --> 00:00:24,320
price of the house in the y axis
given some features of the house.

9
00:00:24,320 --> 00:00:28,435
In this case, the x axis is the square
feet or the size of the house.

10
00:00:28,435 --> 00:00:31,220
And we see on the left here a really nice,
smooth

11
00:00:31,220 --> 00:00:34,850
curve that helps us predict what's the
price for house given it's square feet.

12
00:00:35,870 --> 00:00:39,450
However if instead of using a second
degree polynomial like we did on the left,

13
00:00:39,450 --> 00:00:44,300
we use a higher or lower polynomial we
might get a really crazy wiggly line that

14
00:00:44,300 --> 00:00:49,300
fits the training data extremely well but
doesn't generalize beyond the training

15
00:00:49,300 --> 00:00:54,570
data to data in the test set or
in the truth in the real world.

16
00:00:54,570 --> 00:00:59,730
so in this case we say that the model
F here has over-fit the training data.

17
00:01:00,870 --> 00:01:03,510
Now in classification, we going to
have the same kinds of problems and

18
00:01:03,510 --> 00:01:05,180
we're going to have to try to address it.

19
00:01:05,180 --> 00:01:09,550
What happens when you learn the model,
which is too good on the training data

20
00:01:09,550 --> 00:01:13,250
is training error that is too low but
doesn't do well on the true error.

21
00:01:15,200 --> 00:01:17,830
Now let's preview some of these
overfitting plots which we discussed

22
00:01:17,830 --> 00:01:19,680
quite a lot in the regression course.

23
00:01:19,680 --> 00:01:22,700
So, here in the x axis I'm
showing the model complexity.

24
00:01:22,700 --> 00:01:25,230
The model in the left is
a very simple constant model,

25
00:01:25,230 --> 00:01:29,670
while the model in the right is
a crazily polynomial very high degree.

26
00:01:29,670 --> 00:01:31,850
So, if you look at the training error,

27
00:01:31,850 --> 00:01:35,690
when you use a constant model it doesn't
do really well in the training data.

28
00:01:35,690 --> 00:01:37,715
But it feels this crazy
high degree polynomial,

29
00:01:37,715 --> 00:01:39,990
you might get zero training error.

30
00:01:39,990 --> 00:01:42,180
And somewhere in between,
we get training error.

31
00:01:42,180 --> 00:01:47,630
There is decreasing like that,
and eventually goes to zero.

32
00:01:47,630 --> 00:01:49,730
So this is my training error.

33
00:01:49,730 --> 00:01:53,180
And the problem with
training error goes to zero

34
00:01:53,180 --> 00:01:58,070
is if you look the underlying model,
it doesn't do well on other houses.

35
00:01:58,070 --> 00:02:04,140
So the true error is very high
when you don't have a lot of data.

36
00:02:04,140 --> 00:02:06,290
Sorry when your models very simple.

37
00:02:06,290 --> 00:02:09,150
But it's also very high when
your models very complex.

38
00:02:09,150 --> 00:02:13,360
And the true error function has this
kind of characteristic process.

39
00:02:13,360 --> 00:02:14,610
It goes down and then goes up.

40
00:02:15,620 --> 00:02:18,160
So this is the true error.

41
00:02:19,440 --> 00:02:22,650
So overfitting is going to happen

42
00:02:22,650 --> 00:02:25,960
when we pick something that's
too good in the training data.

43
00:02:25,960 --> 00:02:31,840
So it's somewhere over here, but it
doesn't do well with aspect to true error.

44
00:02:31,840 --> 00:02:37,140
So in other words, you get some w hat over
here that has very low training error,

45
00:02:37,140 --> 00:02:41,780
but high true error while there were
exists some other certain parameters.

46
00:02:41,780 --> 00:02:46,830
We can call them w*, which might not have
done as well in the training data, but

47
00:02:46,830 --> 00:02:51,320
man, they did much better as
performance in the true error.

48
00:02:52,570 --> 00:02:57,370
And so we want to try to avoid
taking these highly-performant

49
00:02:57,370 --> 00:03:01,550
models in the training data that
don't do well in the real world.

50
00:03:01,550 --> 00:03:07,168
So we want to shift From the right
here to the left to avoid overfitting.

51
00:03:10,047 --> 00:03:12,040
Now this was for the regression setting.

52
00:03:12,040 --> 00:03:14,732
Next we'll talk about it in
the classification setting.

53
00:03:14,732 --> 00:03:18,809
[MUSIC]