[MUSIC] With all that in mind let's just dig in and observe how overfitting comes about in the context of decision trees. When we start learning decision tree we learn the decision stump, which is a very simple boundary between the data. Let's saying there's a simple example would be using something of vertical line or horizontal line. So in this case, when the decisions with respect to x(1) as to whether it's less than -0.07 or greater than 0.07. So if it's less than, we are going to predict -1 and that will correspond to this side of the decision boundary, but if it's greater than -0.07 we are going to predict +1 and we are going to be falling to the right side of the decision boundary. So that's a very simple decision boundary with just a depth 1 decision tree or a decision stump, but as we increase the depth, that situation becomes more and more complex. The decision boundaries become more and more complex. So we can see, as you increase the depth you increase the complexity of the decision boundaries. Until we end up with a super crazy one here where the depth is 10. And so, that depth=10 decision boundary has 0.00 training error. So the training error at this point goes down from 0.22 which is relatively high for decision stamp through 0.13, 0.10, which are depth two and depth three decision trees, which looked okay to this crazy decision tree of depth 10, which had zero training error. And this should be a big warning sign for you. And now we should sad face. We've seen that with decision trees, as we increase the depth, the training error goes down until we get to a point where training error could either reach zero, and very often actually does in this case with decision tree of depth 10. So we could say, decision tree of depth 10, that's great, it has zero training error, so that's a perfect decision tree. But in reality, it's not a perfect decision tree. And as we know, even though the training error is zero, the true error can shoot up. And so this can be a really highly over fitting decision tree. So it's good to take a step back and really understand a little bit better why the training error of decision trees tend to go down so quickly with depth. So let's take this simple example where we're just learning decision stump. We have 40 data points, just like we've been using, 22 of them were safe loans, 18 were risky. And we chose first to split on credit. Now the question is why did we choose to split on credit first? Why was that the first feature we chose? And the reason we choose to split on credit first is because it improved the training error the most. Improved the training error from 0.45 to 0.20. So that was a good first split to make. Now, if we go back and review the algorithm for choosing what the best decision split is, the best feature to split on is, we'd try every single possible feature and pick the one that decreases the training error the most. And so at every step of the way, what are the features that decrease the training error? And we're adding features that decrease the training error. And we're adding features and decreasing the error. And eventually we'll drive the training error to zero. Unless of course we get to some points where we can't drive the training error down because we've run out of features to split on and we have positive points on top of negative points, but that's a side note. Most importantly is to remember the training error test go down, down down, down. And so that going down as we increase the depth is what leads to this low training error, which often leads to these very complex trees, which are very prone to overfitting. And here's a real world dataset from loan data. Where we actually observed that big bad overfitting problem. So if we take the depth of the tree, and we push it all the way to depth 18, we'll see that the training error has gone down a lot. And the blue line, which gets you down to about eight percent training error, which is extremely low. However, if you look at the validation set error, man that was not so good. Maybe that's around 39%, and so there's a big gap between the two. Which we'll characterize as a form of over fitting. If somehow you were able to pick the best depth for a decision tree, which in this case was depth seven. Then you notice that in this case, The validation error is just under 35%. In other words, if i could pick the right depth I would get 39% validation error. But, if I let it go until very low training error, I get a validation error of 39%. So going all the way, bad idea. Gotta stop a little earlier. [MUSIC]