1 00:00:00,059 --> 00:00:04,399 [MUSIC] 2 00:00:04,399 --> 00:00:07,591 Next we'll describe the pruning procedure which 3 00:00:07,591 --> 00:00:10,479 takes this cost function that we describe and 4 00:00:10,479 --> 00:00:16,030 uses it to throw away decisions in the tree that were not that important. 5 00:00:16,030 --> 00:00:20,610 The idea is pretty simple, consider a split in the decision tree and usually we 6 00:00:20,610 --> 00:00:24,970 start from the bottom, so we'll start here with this candidate split to prune. 7 00:00:24,970 --> 00:00:28,500 Should we split on term after we split first on credit and 8 00:00:28,500 --> 00:00:31,690 then on the income mm, should we do that split or not. 9 00:00:33,140 --> 00:00:36,630 Well we can use our cost function to figure out what to do. 10 00:00:36,630 --> 00:00:41,184 So let's say that the tree that showed everything here 11 00:00:41,184 --> 00:00:45,342 has an error of 0.25 in the training data, but 12 00:00:45,342 --> 00:00:49,780 we're trying to decide whether to split or not. 13 00:00:49,780 --> 00:00:52,480 Well, let's see how many leaves we have. 14 00:00:52,480 --> 00:00:59,312 We have one, two, three, four, five, six leaves and if we use a lambda of 0.3, 15 00:00:59,312 --> 00:01:04,128 then the total here, the error plus lambda times the number 16 00:01:04,128 --> 00:01:10,030 of leaves will be 0.25 plus 0.18 which is 0.43. 17 00:01:10,030 --> 00:01:12,940 So that's the total, grand total here. 18 00:01:14,450 --> 00:01:20,230 Now let's see what happens after we do the pruning, if we were to do the pruning. 19 00:01:20,230 --> 00:01:25,357 So if we were to do the pruning, let's say that the smaller tree is being 20 00:01:25,357 --> 00:01:31,177 pruned where this last year has been pruned has slightly higher training error. 21 00:01:31,177 --> 00:01:38,050 So, training error min up to 0.26 and you're like, should I really do it? 22 00:01:38,050 --> 00:01:39,820 Well let's look at the number of leaves. 23 00:01:39,820 --> 00:01:45,245 Its one, two, three, four, five leaves, and so we're here five leaves and 24 00:01:45,245 --> 00:01:49,710 say, okay, we have five leaves instead of six, so one less leaf. 25 00:01:49,710 --> 00:01:51,400 The training error went up a little bit, 26 00:01:51,400 --> 00:01:56,270 but if you do five times 0.3, which is rounded, you get 0.15. 27 00:01:56,270 --> 00:02:04,950 You add that to 0.26, you get a grand total of 0.41 and 28 00:02:04,950 --> 00:02:12,260 magically say this new tree Tsmaller is looking promising. 29 00:02:12,260 --> 00:02:19,032 It has worse training error by a little bit but lower overall cost. 30 00:02:19,032 --> 00:02:24,330 And so since it has lower overall cost, we're going to end up pruning, good idea. 31 00:02:25,343 --> 00:02:29,500 So that's the kind of simplification that looks okay in a simple example, when you 32 00:02:29,500 --> 00:02:33,370 have a massive tree of tons of data, it's hard to imagine how this things can prune. 33 00:02:33,370 --> 00:02:34,610 But it must happen, 34 00:02:34,610 --> 00:02:38,440 otherwise you're going to have tremendous amounts of over fitting in decision trees. 35 00:02:40,240 --> 00:02:44,960 We're not just going to do the pruning procedure for the very bottom of the tree, 36 00:02:44,960 --> 00:02:49,250 we're going to keep going up the tree and revisit every decision that we've made in 37 00:02:49,250 --> 00:02:53,260 the decision tree and say is it worth pruning, is this worth pruning it? 38 00:02:53,260 --> 00:02:56,050 Is it worth pruning income after credit? 39 00:02:56,050 --> 00:02:57,860 Is it worth pruning term after credit? 40 00:02:57,860 --> 00:03:01,400 Is it worth pruning credit itself and just have the root node? 41 00:03:01,400 --> 00:03:05,340 So we're going to check all those out and then find 42 00:03:05,340 --> 00:03:09,760 the best tree after this pruning procedure and output that as our solution. 43 00:03:11,220 --> 00:03:16,200 For completeness, I've included here the full algorithm for 44 00:03:16,200 --> 00:03:19,335 building up a decision tree using pruning. 45 00:03:19,335 --> 00:03:23,650 It's a little bit more contrived, I'll leave you this reference for 46 00:03:23,650 --> 00:03:28,330 those who want to implement it themselves, but it's relatively simple to 47 00:03:28,330 --> 00:03:33,320 implement and not that different than what you might have done in the regular case. 48 00:03:33,320 --> 00:03:35,912 This kind of idea is fundamental though and 49 00:03:35,912 --> 00:03:41,024 it's used in every decision tree learning approach out there with small caveats and 50 00:03:41,024 --> 00:03:44,349 changes in the parameters, but overall, same idea. 51 00:03:44,349 --> 00:03:48,969 [MUSIC]