1
00:00:00,000 --> 00:00:04,599
[MUSIC]

2
00:00:04,599 --> 00:00:08,385
We've now seen the basics of decision
trees, which are an amazing type of

3
00:00:08,385 --> 00:00:12,186
classifier than can be used for
when the range of different types of data.

4
00:00:12,186 --> 00:00:15,274
However, decision trees
are highly prone to overfitting.

5
00:00:15,274 --> 00:00:19,946
And so, let's dig in a little bit in this
module on how we can avoid overfitting in

6
00:00:19,946 --> 00:00:21,819
the context of decision trees.

7
00:00:23,360 --> 00:00:28,020
And as a reminder, we're going to continue
to use our loan application evaluation

8
00:00:28,020 --> 00:00:32,590
system as a running example
where loan data will come in and

9
00:00:32,590 --> 00:00:38,120
we'll be able to predict whether that's
a safe loan or a risky loan application.

10
00:00:38,120 --> 00:00:39,940
And so, that's the decision
making we're trying to make.

11
00:00:41,340 --> 00:00:44,600
And from that loan application,
we're going to learn the decision tree

12
00:00:44,600 --> 00:00:46,520
that allows us to traverse
down the tree and

13
00:00:46,520 --> 00:00:52,220
make a prediction as to whether
a particular loan is risky or safe.

14
00:00:52,220 --> 00:00:54,630
And so, the input is going to be xi and

15
00:00:54,630 --> 00:00:58,790
the output is going to be this y hat i
that we're going to predict from data.

16
00:01:00,580 --> 00:01:04,610
Let's first, spend a quick minute
reviewing overfitting and then dig in as

17
00:01:04,610 --> 00:01:08,150
to how it happens in decision trees, which
hint, hint is going to be really bad.

18
00:01:10,360 --> 00:01:15,608
As we all recall, overfitting is the fact
that separates the training error

19
00:01:15,608 --> 00:01:21,102
with then goes down to zero as we make our
models more and more complex and the true

20
00:01:21,102 --> 00:01:26,854
error, which goes down with the complexity
of the model, but then spikes backup.

21
00:01:26,854 --> 00:01:33,121
And so more specifically, overfitting
happens when we end up model w hat,

22
00:01:33,121 --> 00:01:37,434
which has low training_error,
but high true_error.

23
00:01:37,434 --> 00:01:41,238
But there was some other model or
model parameters, w*,

24
00:01:41,238 --> 00:01:46,457
which had maybe higher training_error,
but definitely lower true_error.

25
00:01:46,457 --> 00:01:49,290
And so, that's the overfitting problem.

26
00:01:49,290 --> 00:01:55,580
And when somehow, pick a model that's less
complex to avoid that kind of overfitting.

27
00:01:55,580 --> 00:02:00,659
So we saw this effect in logistic
regression quite pronouncedly where as we

28
00:02:00,659 --> 00:02:06,141
increase the degree of the polynomial,
we got this crazier and crazier decision

29
00:02:06,141 --> 00:02:11,398
boundaries where we saw overfitting,
which was bad overfitting over here.

30
00:02:11,398 --> 00:02:16,781
But overfitting for polynomials of
degree six and then polynomials

31
00:02:16,781 --> 00:02:22,362
of degree 20 here for features,
this is a technical term that I use.

32
00:02:22,362 --> 00:02:28,836
I think I use crazy decision boundary,
but let's call it crazy overfitting.

33
00:02:28,836 --> 00:02:31,402
So really, bad stuff.

34
00:02:31,402 --> 00:02:34,500
And so, we're trying to
avoid overly complex models.

35
00:02:34,500 --> 00:02:36,803
And as well see with decision trees,

36
00:02:36,803 --> 00:02:39,640
models can get overly
complex very quickly.

37
00:02:39,640 --> 00:02:44,819
[MUSIC]