[MUSIC] So what does it mean to avoid overfitting in decision trees, what does it mean to stop a little earlier, or what does it mean to try to simplify the types of trees that we learned so that we don't have these crazy overfitting problems. And what we'll use is a type of regularization which bias is a sort simpler trees, and that's also can philosophically relate to the concept called Occam's razor. And so let's talk about that principal. It's interesting that this idea of finding simpler trees simpler explanations for data is a concept that goes back to a long, long time, so many people call it Occam's razor, because of William of Occam, who was talking about this concept around the 13th Century. Although, this goes back to Pythagoras and Aristotle. We know 2,000 years or so before that, but basically the concept there has been of the possible hypothesis of the possible explanations as we'll of the possible trees. You should always pick the one that explains the data with as simple an explanation as possible, so for example, let's say that you have that you go to the doctor, and say I have these two symptoms S1 and S2. Let's say I have a terrible red headache and a terrible soar throat. So, there's two options the doctor might see. A brain tumor explains the headache and a throat infection explains a soar throat. Or it could say, okay, there's another disease, let's say common cold, that explains both the headache and the sore throat. And in that case it's probably better, probably more likely that it's just one disease that explains both symptoms. Then having two separate diseases one explains symptom one, the other one explains symptom two. Now, it's kind of a general idea. The way that Occam talked about it was among competing hypothesis the one that fields assumptions should always be selected. And so fewest assumptions is simplest in our case. In a contest of decision trees, the Occam's razor principle says if you have two trees that have similar classification error, pick the simpler one. So, let's go through the examples below. Let's see, we have the training error. For four different trees, and then we have the validation error for four different trees. So we observe a couple different things. We observe that two trees have the same validation error, and that one tree has a very low train error but higher validation error, so the lower tree is clearly overfit. The super complex tree is definitely bad. But interestingly, what you do about these two trees, the moderate and one, and the more complex one they get same validation error, but one has higher train error, you can say well, you know what the simpler trees probably better is less likely to overfit. So, I'm probably going to pick the moderately complex tree that gives me reasonable train error and no validation error. So, that's kind of intellectually, conceptually validating kind of idea. Try to fix simpler trees that explain the data relatively well, and do well in the validation set. But to get this right, we have to go and dig in, and define what a simpler tree is. So the question here is what makes a tree simpler? And so we have this really complex, high depth, complicated tree on the left or the shallower, two level decision tree on the right. And clearly, the one on the right is simpler. So if they both have similar performance, let's pick the simple trees. But in general, let's try to bias our algorithms to find the simpler trees rather than the more complex ones. We're going to see next, some modification of the decision tree algorithm that's going to bias ourselves to simpler trees, so in other words, let's try to avoid complex trees and focus on simpler trees. And the question is how do we do that? And there's two main approaches out there for doing that one is what's called early stopping. Instead of keeping making your tree deeper and deeper and deeper, somehow stop earlier to avoid overfitting and stop a simpler trees. The more advanced approach tends to work much better. It's called pruning where you build a deeper tree and then you chop off some leaves at the bottom, some decisions that were not that important and come up with a simpler tree that way. We're going to talk about both of these approaches in today's module. [MUSIC]