[MUSIC] We'll close off this module by
exploring a little bit more detail. I'll try example of decision trees and comparing it to our example of
logistic regression from before. Here is the example that we will consider, is one that we considered in
the logistic regression module. We're taking this dataset here and fitting a logistic regression,
decision boundary is on the right here. And we can see the parameters
that we learned from data. So this is if we just did
a simple degree one polynomial feature is just going to
straight up linear features. Let's see what happens when we build
a decision stump on the same data. So If you do a single level decision
tree and you apply to this data set, what we'll do is we'll split on x1 and
if you try out the threshold values you'll see that on the left
side we'll have x1 less than- 0.07 and on the right side will be
x1 greater than- 0.07. You see for the ones on the right. We have more positive examples so we're going to predict positive,
while the ones on the left, we have more negative examples, so
we're going to predict negative. And you'll see that they correspond. So here's our split. - 0.07. This first left on the tree will
correspond to points on the left there, and the second one will correspond
to points on the right. Extremely twittered. So unlike, rational we're able to
get the diagonal decision boundary. Here we're only going to get straight up,
or straight across, decision boundary. At least in the first split. Now, if we keep going with the degree of
the algorithm, and we take another split for each one of these intermediate nodes,
we'll see that for the data where x1 is less than- 0.07,
we might split on x1 again, and make predictions which in this
case would be splitting on x1, less than- 1.66, and on both cases, we're predicting to be- 1, so a negative data point, so- 1,- 1. But for x1 greater than- 0.07, we now split on x2,
whether it's bigger than or greater than 1.55, which in our split
on our data is over here, 1.55. That's where x2 would be. And we see that now we have, for the data
which is smaller than 1.55, in x2, we have 11 positive examples,
so we predict + 1. But for
the data which is greater than 1.55, we have three negative examples,
so we predict- 1. And so now we have a much
more interesting region so the split, these two nodes here on
the right, one corresponds to this big green area and the other one
corresponds to this little box on the top. And we can imagine continuing
this process, splitting again and again and again as we'll see next,
but one important thing to know is that when you have continuous variables
and And this is a really important point. Unlike for discrete ones or for categorical ones, we can split
on the same variable multiple times. So we can split the next one and split the
next one again, or we can split the next one and the next two, then the next
one and again, and we'll see that. So in this example that we talked about, we can keep the decision tree
learning process growing. So in Depth 1 we just get a decision
stump which corresponds to the vertical line that we drew in the beginning- 0.07. If we go to Depth 2, we get this little
box that contains most of the positive examples but it is still some
misclassifications over here, and then if you kept going, splitting, splitting,
splitting, splitting, splitting, splitting all the way to Depth 10, we
get this really crazy decision boundary. And if you look at it carefully,
what you'll see is that, and I'm going to draw over the decision boundary here, but
if you look at it a little bit carefully, you'll see that it basically makes,
no mistakes, so it has 0 training error. And we can compare what we saw
with Logistic Regression with what we're seeing with Decision Trees,
and understand again, in preview for what we will see next module,
kind of the notion of overfitting. So, in Logistic Regression we started
with Degree 1 parameter features, and we saw in Degree 2 had a really nice
fit of the data, is a nice parable. You didn't get everything right,
but you did pretty well. And the degree six polynomial had
a really crazy decision boundary. It got zero training error, but
I didn't really trust those predictions, we didn't really trust those prediction. With the decision tree, what you control
is the depth of the decision tree and so Depth 1 was just a decision stamp. It didn't do so well. If you go to Depth 3, it looks like
a little bit of a jagged line, but it looks like a pretty
nice decision boundary. It makes a few mistakes,
but it looks pretty good. If you look at Depth 10,
you get this crazy decision boundary, has zero training error, but
is likely to be over fitting. [MUSIC]