[MUSIC] Well, let's start with the idea of early stopping where we keep going down but we stop before the tree gets too complex. So, how does that work? Simply recall that as we add more depth to the tree we're increasing the complexity of the model. And so if we are avoiding these super complex models now we can just avoid very deep trees. So the first early stopping condition is absolutely simple, just limit the depth of the tree. Now we know that the training error goes down as you increase the depth of the tree, but the true error goes down and then goes up. So maybe there's some magical max depth where before that, the trees are simple, they get to good true error, but after that the trees get too complex and the true error keeps going up. And so somehow, this max depth is going to be a magic parameter we're going to provide, where we stop going down the recursion once we hit that max_depth. And just like any other magic parameter in machine learning, you have to figure out how to pick it. And the answer is validation set or cross-validation. Never use a validation set, sorry, never use your test set to pick your magic parameters. So this what you are to do with max_depth. Limiting the depth is just one approach for limiting the complexive decision tree. However, if you limit the depth you're cutting every branch the same way, but some branches might want to go further than others. Let's take a look at an approach where some branches might go down further than others. So for example, we can limit we can stop recursing if the classification error doesn't get better. So we split up credit and now we're looking at what happens if we recur some poor. So of the five safe loans that we have in poor, with credit score poor, and the 16 risky loans of credit card poor, if we just stop there and made no splits we'd be making five mistakes. The error will be 5 out of 21 which is 0.24. Now if I were to try to split on the other features, term and income, suppose that neither of them helped. Suppose that neither of them decreased the training error. So training error we're still 0.24 for both of them, what should I do? Now I could try to split on one then keep going, but that will make the tree more and more complex. So a second early stopping condition we might use is to say stop there. If the tree error is not decreasing, don't make the tree more complex. Use Occam's Razor and stop the process right there and just say everything after credit goes poor is just a risky loan. And then just focus on what to do with the note where credit is fair. So the second type of early stopping criteria we're exploring is keep recursing, but stop if the tree error does not decrease, don't make the tree more complex. Now there's a few practical caveats about this. First, typically you don't just stop, the error doesn't decrease at all. You put a magic parameter, you say stop if the error doesn't increase by more than absulum. Let's say If it doesn't increase by more than 1% we just stop. Then there's a few pitfalls in practice for this rule and we talk about those a little bit in the context of pruning. So this is a rule that you should be super careful about, but it's very useful in practice. So something to definitely keep in mind. The third type of early stopping rule is extremely important, and you should always include it. And it just says, if there is only a few data points left in a node, don't split any further, because then you're really risking overfitting. So an example would be looking at, we had that case or credit that was equal to fair. Well there is only one safe loan there and two risky loans. So total of only three data points fall inside of that node. In that case, if we are trying to spilt further we are going to find lots of patterns in the data that don't make much sense because there is very little data to support those patterns. With three data points, I don't know if I really trust a decision tree algorithm to make tons of decision. So I'll just stop recursing right there. In general, we should always set kind of parameters, say Nmin, which says if a node has few data points then Nmin. Or Nmin or less, then just stop recursing right there and call that early stopping condition three. This one you should always implement no matter what. For example, here I'm suggesting setting Nmin to 10. 10 is quite a nice number for Nmin if you have a relatively small dataset size. If you have a huge dataset size you might set it to 100. But somewhere between 10 and 100 is a good value for Nmin. So let's summarize the early stopping conditions that we described in this approach to try to find decision trees that are not too complex. There are three. The first one is limiting the depth of the tree. That's a relatively good condition to set. You should set that one to be something pretty carefully using cross validation. The second one is stop recursing if the classification error does not decrease by enough, increase by zero or doesn't increase by more than absolute. It is a little risky but can be very useful. The third one, extremely useful, we should always do it, stop recursing if there is not that many data points left inside of a node. So these are some of the key early stopping conditions a decision tree algorithm should be thinking about. If we go back to the decision tree video learning for the decision tree algorithm that we discussed in the previous module, we start with the empty tree. We selected a feature to split on, then we said stop if there was nothing left to do and just make the prediction. This is the majority class. And otherwise we want to recurse and go back to step two. Now, we considered two stopping conditions in the previous module. And to those we are adding the three early stopping conditions that we just described. And so small modifications to the code we've already built gives you this extra factor to help prevent some overfitting. [MUSIC]