[MUSIC] So now we've discussed the idea of
building a decision stump from data and a little bit of what
that data set looks like. Let's discuss now how to
pick the right feature to split on when we're
building a decision stump. So trying to learn decision stump
from data and he was split on credit. But the question is what is
the best feature to split on and how do we measure that? So in our example we split on credit or
we could have split on something else. Let's say we could have
split on term of the loan, this is a three year loan or
a five year loan. And the question is,
what is better between the two? What is a better split? And that's what we're going to ask next. And intuitively, a better split
is one that gives you lowest classification error, and that's exactly
what we'll explore in the algorithm. We'd like to figure out whether
it's better to split on credit or split on term. And the way we're going to do that is
by measuring the number of mistakes each one of the decision stumps makes and pick the one that makes the least number
of mistakes, so has lowest error. And just remember the error is
just the number of mistakes a classifier makes divided by
the total number of data points. So, lets start with the root node. And this is if we made no splits
on the decision stump and measure the error that
we get in that example. So, as a reminder,
we're going to predict y hat to be the majority class associated
with a particular node. So, in our case here the class
that has most values for the root is going to be the safe class. And we're going to compute
the classification error of making that prediction. So here worse classification
error we're just saying that all the data points are safe. So we can go ahead and say that there
were 22 correct things, 22 were safe and 18 mistakes. So the classification
error here is going to be, I'm going to change colors,
is going to be 18 / 22 + 18. Which is 40. So, 18 over 40 which is 0.45. It's a binary classification problem. You only get 0.45, I mean,
you get 0.45 error which is really bad. And so not splitting on anything
gives you a pretty bad result. So the question here is, how good is
the decision stump that splits on credit? And how does it compare to
not splitting on anything, which had a classification error of 0.45? Is this one better? So let's look at the decision stump. For data points that have excellent credit
we're going to predict that they're safe. For those that have fair credit,
we're going to predict that they're safe. And ones that have poor credit we're
going to predict them to be risky. And so this is our prediction, and this is again the majority value of
the data in each one of these nodes. And if you look at how many mistakes
we make, you'll see that for the data that had excellent credit, we make zero mistakes because
everything was safe there. For data that had fair credit,
we're going to make four mistakes. There were four risky
loans with fair credit. And then for data that had poor credit,
and we'll see there's four mistakes again because there were four safe
loans when we had a poor credit. So let's compute our overall error. We're going to write over here. I'm just going to change my colors. And we'll see that we make 4 + 4 mistakes. So that's 8 out of 40 data points,
so that's an error of 0.20, 0.2. Which is smaller than we had, 0.45. So, we've gone down now from 0.45 to 0.2. So splitting on credit seems
like a pretty good idea. Now let's see what happens when
we split on term of the loan. We still have term of the loan. If the term is three years, maybe there
is 16 safe loans and 4 risky ones. So, in this case we're
making four mistakes. And for five years we predicted risky
where there were six safe loans, so now we're making six mistakes. And so if you look at the overall
error here it's 4 + 6 / 40. So that's 10 divided by 40, which is 0.25. So overall, if we look at our data, not splitting on anything,
the root node, has 0.45 error, splitting on credit has 0.2 error,
and splitting on term has 0.25 error. So can go back and ask,
what is the best choice? Should we split on credit,
or should split on term? The answer now becomes obvious. Splitting on credit gives you
lower classification error, so this is what our greedy
algorithm will do first. This is the first feature to split on. And that would be the winner
of our selection process. So in general, the decision tree splitting process will say given
the subset of data at node M, which is what we're looking at,
the root node so far, we try out every feature xi, which in our case here was credit,
term, and income. And I could see splitting the data
according to possible values of each one of these features, and
I compute a classification error of the resulting split,
just like we did manually over here. And then I pick the feature, which in our case was credit which
had the lowest classification error. So, if we go back to our decision tree
learning algorithm that first challenge that we had, figuring out what feature to
split on, we can now do that using this feature split selection algorithm that
minimizes the classification error. So next, we'll explore the other parts of
this decision tree learning algorithm. [MUSIC]