1
00:00:00,059 --> 00:00:04,399
[MUSIC]

2
00:00:04,399 --> 00:00:07,591
Next we'll describe
the pruning procedure which

3
00:00:07,591 --> 00:00:10,479
takes this cost function
that we describe and

4
00:00:10,479 --> 00:00:16,030
uses it to throw away decisions in
the tree that were not that important.

5
00:00:16,030 --> 00:00:20,610
The idea is pretty simple, consider a
split in the decision tree and usually we

6
00:00:20,610 --> 00:00:24,970
start from the bottom, so we'll start
here with this candidate split to prune.

7
00:00:24,970 --> 00:00:28,500
Should we split on term after
we split first on credit and

8
00:00:28,500 --> 00:00:31,690
then on the income mm,
should we do that split or not.

9
00:00:33,140 --> 00:00:36,630
Well we can use our cost function
to figure out what to do.

10
00:00:36,630 --> 00:00:41,184
So let's say that the tree
that showed everything here

11
00:00:41,184 --> 00:00:45,342
has an error of 0.25 in the training data,
but

12
00:00:45,342 --> 00:00:49,780
we're trying to decide whether to split or
not.

13
00:00:49,780 --> 00:00:52,480
Well, let's see how many leaves we have.

14
00:00:52,480 --> 00:00:59,312
We have one, two, three, four, five,
six leaves and if we use a lambda of 0.3,

15
00:00:59,312 --> 00:01:04,128
then the total here,
the error plus lambda times the number

16
00:01:04,128 --> 00:01:10,030
of leaves will be 0.25
plus 0.18 which is 0.43.

17
00:01:10,030 --> 00:01:12,940
So that's the total, grand total here.

18
00:01:14,450 --> 00:01:20,230
Now let's see what happens after we do
the pruning, if we were to do the pruning.

19
00:01:20,230 --> 00:01:25,357
So if we were to do the pruning,
let's say that the smaller tree is being

20
00:01:25,357 --> 00:01:31,177
pruned where this last year has been
pruned has slightly higher training error.

21
00:01:31,177 --> 00:01:38,050
So, training error min up to 0.26 and
you're like, should I really do it?

22
00:01:38,050 --> 00:01:39,820
Well let's look at the number of leaves.

23
00:01:39,820 --> 00:01:45,245
Its one, two, three, four, five leaves,
and so we're here five leaves and

24
00:01:45,245 --> 00:01:49,710
say, okay, we have five leaves
instead of six, so one less leaf.

25
00:01:49,710 --> 00:01:51,400
The training error went up a little bit,

26
00:01:51,400 --> 00:01:56,270
but if you do five times 0.3,
which is rounded, you get 0.15.

27
00:01:56,270 --> 00:02:04,950
You add that to 0.26,
you get a grand total of 0.41 and

28
00:02:04,950 --> 00:02:12,260
magically say this new tree
Tsmaller is looking promising.

29
00:02:12,260 --> 00:02:19,032
It has worse training error by
a little bit but lower overall cost.

30
00:02:19,032 --> 00:02:24,330
And so since it has lower overall cost,
we're going to end up pruning, good idea.

31
00:02:25,343 --> 00:02:29,500
So that's the kind of simplification that
looks okay in a simple example, when you

32
00:02:29,500 --> 00:02:33,370
have a massive tree of tons of data, it's
hard to imagine how this things can prune.

33
00:02:33,370 --> 00:02:34,610
But it must happen,

34
00:02:34,610 --> 00:02:38,440
otherwise you're going to have tremendous
amounts of over fitting in decision trees.

35
00:02:40,240 --> 00:02:44,960
We're not just going to do the pruning
procedure for the very bottom of the tree,

36
00:02:44,960 --> 00:02:49,250
we're going to keep going up the tree and
revisit every decision that we've made in

37
00:02:49,250 --> 00:02:53,260
the decision tree and say is it worth
pruning, is this worth pruning it?

38
00:02:53,260 --> 00:02:56,050
Is it worth pruning income after credit?

39
00:02:56,050 --> 00:02:57,860
Is it worth pruning term after credit?

40
00:02:57,860 --> 00:03:01,400
Is it worth pruning credit itself and
just have the root node?

41
00:03:01,400 --> 00:03:05,340
So we're going to check all those out and
then find

42
00:03:05,340 --> 00:03:09,760
the best tree after this pruning procedure
and output that as our solution.

43
00:03:11,220 --> 00:03:16,200
For completeness,
I've included here the full algorithm for

44
00:03:16,200 --> 00:03:19,335
building up a decision tree using pruning.

45
00:03:19,335 --> 00:03:23,650
It's a little bit more contrived,
I'll leave you this reference for

46
00:03:23,650 --> 00:03:28,330
those who want to implement it themselves,
but it's relatively simple to

47
00:03:28,330 --> 00:03:33,320
implement and not that different than what
you might have done in the regular case.

48
00:03:33,320 --> 00:03:35,912
This kind of idea is
fundamental though and

49
00:03:35,912 --> 00:03:41,024
it's used in every decision tree learning
approach out there with small caveats and

50
00:03:41,024 --> 00:03:44,349
changes in the parameters,
but overall, same idea.

51
00:03:44,349 --> 00:03:48,969
[MUSIC]