[MUSIC] Next we'll describe the pruning procedure which takes this cost function that we describe and uses it to throw away decisions in the tree that were not that important. The idea is pretty simple, consider a split in the decision tree and usually we start from the bottom, so we'll start here with this candidate split to prune. Should we split on term after we split first on credit and then on the income mm, should we do that split or not. Well we can use our cost function to figure out what to do. So let's say that the tree that showed everything here has an error of 0.25 in the training data, but we're trying to decide whether to split or not. Well, let's see how many leaves we have. We have one, two, three, four, five, six leaves and if we use a lambda of 0.3, then the total here, the error plus lambda times the number of leaves will be 0.25 plus 0.18 which is 0.43. So that's the total, grand total here. Now let's see what happens after we do the pruning, if we were to do the pruning. So if we were to do the pruning, let's say that the smaller tree is being pruned where this last year has been pruned has slightly higher training error. So, training error min up to 0.26 and you're like, should I really do it? Well let's look at the number of leaves. Its one, two, three, four, five leaves, and so we're here five leaves and say, okay, we have five leaves instead of six, so one less leaf. The training error went up a little bit, but if you do five times 0.3, which is rounded, you get 0.15. You add that to 0.26, you get a grand total of 0.41 and magically say this new tree Tsmaller is looking promising. It has worse training error by a little bit but lower overall cost. And so since it has lower overall cost, we're going to end up pruning, good idea. So that's the kind of simplification that looks okay in a simple example, when you have a massive tree of tons of data, it's hard to imagine how this things can prune. But it must happen, otherwise you're going to have tremendous amounts of over fitting in decision trees. We're not just going to do the pruning procedure for the very bottom of the tree, we're going to keep going up the tree and revisit every decision that we've made in the decision tree and say is it worth pruning, is this worth pruning it? Is it worth pruning income after credit? Is it worth pruning term after credit? Is it worth pruning credit itself and just have the root node? So we're going to check all those out and then find the best tree after this pruning procedure and output that as our solution. For completeness, I've included here the full algorithm for building up a decision tree using pruning. It's a little bit more contrived, I'll leave you this reference for those who want to implement it themselves, but it's relatively simple to implement and not that different than what you might have done in the regular case. This kind of idea is fundamental though and it's used in every decision tree learning approach out there with small caveats and changes in the parameters, but overall, same idea. [MUSIC]