[MUSIC] Well, let's start with the idea of early
stopping where we keep going down but we stop before the tree gets too complex. So, how does that work? Simply recall that as we
add more depth to the tree we're increasing
the complexity of the model. And so if we are avoiding
these super complex models now we can just avoid very deep trees. So the first early stopping
condition is absolutely simple, just limit the depth of the tree. Now we know that the training error
goes down as you increase the depth of the tree, but
the true error goes down and then goes up. So maybe there's some magical
max depth where before that, the trees are simple,
they get to good true error, but after that the trees get too complex and
the true error keeps going up. And so somehow, this max depth is going to
be a magic parameter we're going to provide, where we stop going down
the recursion once we hit that max_depth. And just like any other magic
parameter in machine learning, you have to figure out how to pick it. And the answer is validation set or
cross-validation. Never use a validation set, sorry, never use your test set to
pick your magic parameters. So this what you are to do with max_depth. Limiting the depth is just one approach
for limiting the complexive decision tree. However, if you limit the depth you're
cutting every branch the same way, but some branches might want
to go further than others. Let's take a look at an approach
where some branches might go down further than others. So for example, we can limit we can stop recursing if the
classification error doesn't get better. So we split up credit and now we're looking at what
happens if we recur some poor. So of the five safe loans that we
have in poor, with credit score poor, and the 16 risky loans of credit
card poor, if we just stop there and made no splits we'd be
making five mistakes. The error will be 5 out
of 21 which is 0.24. Now if I were to try to split
on the other features, term and income, suppose that
neither of them helped. Suppose that neither of them
decreased the training error. So training error we're still 0.24 for
both of them, what should I do? Now I could try to split
on one then keep going, but that will make the tree more and
more complex. So a second early stopping condition
we might use is to say stop there. If the tree error is not decreasing,
don't make the tree more complex. Use Occam's Razor and
stop the process right there and just say everything after credit
goes poor is just a risky loan. And then just focus on what to do
with the note where credit is fair. So the second type of early stopping
criteria we're exploring is keep recursing, but stop if the tree error does not decrease,
don't make the tree more complex. Now there's a few practical
caveats about this. First, typically you don't just stop,
the error doesn't decrease at all. You put a magic parameter, you say stop if the error doesn't
increase by more than absulum. Let's say If it doesn't increase
by more than 1% we just stop. Then there's a few pitfalls
in practice for this rule and we talk about those a little
bit in the context of pruning. So this is a rule that you should
be super careful about, but it's very useful in practice. So something to definitely keep in mind. The third type of early stopping
rule is extremely important, and you should always include it. And it just says, if there is only
a few data points left in a node, don't split any further, because then
you're really risking overfitting. So an example would be looking at,
we had that case or credit that was equal to fair. Well there is only one safe loan there and
two risky loans. So total of only three data
points fall inside of that node. In that case, if we are trying to spilt
further we are going to find lots of patterns in the data that don't make much
sense because there is very little data to support those patterns. With three data points, I don't know if I really trust a decision
tree algorithm to make tons of decision. So I'll just stop recursing right there. In general, we should always set
kind of parameters, say Nmin, which says if a node has
few data points then Nmin. Or Nmin or less, then just stop recursing right there and
call that early stopping condition three. This one you should always
implement no matter what. For example,
here I'm suggesting setting Nmin to 10. 10 is quite a nice number for Nmin if you
have a relatively small dataset size. If you have a huge dataset
size you might set it to 100. But somewhere between 10 and
100 is a good value for Nmin. So let's summarize the early stopping
conditions that we described in this approach to try to find decision
trees that are not too complex. There are three. The first one is limiting
the depth of the tree. That's a relatively good condition to set. You should set that one to be something
pretty carefully using cross validation. The second one is stop recursing if the
classification error does not decrease by enough, increase by zero or
doesn't increase by more than absolute. It is a little risky but
can be very useful. The third one, extremely useful,
we should always do it, stop recursing if there is not that
many data points left inside of a node. So these are some of the key
early stopping conditions a decision tree algorithm
should be thinking about. If we go back to the decision tree video
learning for the decision tree algorithm that we discussed in the previous module,
we start with the empty tree. We selected a feature to split on, then we said stop if there was nothing
left to do and just make the prediction. This is the majority class. And otherwise we want to recurse and
go back to step two. Now, we considered two stopping
conditions in the previous module. And to those we are adding
the three early stopping conditions that we just described. And so small modifications
to the code we've already built gives you this extra factor
to help prevent some overfitting. [MUSIC]