[MUSIC] So now we've discussed the idea of building a decision stump from data and a little bit of what that data set looks like. Let's discuss now how to pick the right feature to split on when we're building a decision stump. So trying to learn decision stump from data and he was split on credit. But the question is what is the best feature to split on and how do we measure that? So in our example we split on credit or we could have split on something else. Let's say we could have split on term of the loan, this is a three year loan or a five year loan. And the question is, what is better between the two? What is a better split? And that's what we're going to ask next. And intuitively, a better split is one that gives you lowest classification error, and that's exactly what we'll explore in the algorithm. We'd like to figure out whether it's better to split on credit or split on term. And the way we're going to do that is by measuring the number of mistakes each one of the decision stumps makes and pick the one that makes the least number of mistakes, so has lowest error. And just remember the error is just the number of mistakes a classifier makes divided by the total number of data points. So, lets start with the root node. And this is if we made no splits on the decision stump and measure the error that we get in that example. So, as a reminder, we're going to predict y hat to be the majority class associated with a particular node. So, in our case here the class that has most values for the root is going to be the safe class. And we're going to compute the classification error of making that prediction. So here worse classification error we're just saying that all the data points are safe. So we can go ahead and say that there were 22 correct things, 22 were safe and 18 mistakes. So the classification error here is going to be, I'm going to change colors, is going to be 18 / 22 + 18. Which is 40. So, 18 over 40 which is 0.45. It's a binary classification problem. You only get 0.45, I mean, you get 0.45 error which is really bad. And so not splitting on anything gives you a pretty bad result. So the question here is, how good is the decision stump that splits on credit? And how does it compare to not splitting on anything, which had a classification error of 0.45? Is this one better? So let's look at the decision stump. For data points that have excellent credit we're going to predict that they're safe. For those that have fair credit, we're going to predict that they're safe. And ones that have poor credit we're going to predict them to be risky. And so this is our prediction, and this is again the majority value of the data in each one of these nodes. And if you look at how many mistakes we make, you'll see that for the data that had excellent credit, we make zero mistakes because everything was safe there. For data that had fair credit, we're going to make four mistakes. There were four risky loans with fair credit. And then for data that had poor credit, and we'll see there's four mistakes again because there were four safe loans when we had a poor credit. So let's compute our overall error. We're going to write over here. I'm just going to change my colors. And we'll see that we make 4 + 4 mistakes. So that's 8 out of 40 data points, so that's an error of 0.20, 0.2. Which is smaller than we had, 0.45. So, we've gone down now from 0.45 to 0.2. So splitting on credit seems like a pretty good idea. Now let's see what happens when we split on term of the loan. We still have term of the loan. If the term is three years, maybe there is 16 safe loans and 4 risky ones. So, in this case we're making four mistakes. And for five years we predicted risky where there were six safe loans, so now we're making six mistakes. And so if you look at the overall error here it's 4 + 6 / 40. So that's 10 divided by 40, which is 0.25. So overall, if we look at our data, not splitting on anything, the root node, has 0.45 error, splitting on credit has 0.2 error, and splitting on term has 0.25 error. So can go back and ask, what is the best choice? Should we split on credit, or should split on term? The answer now becomes obvious. Splitting on credit gives you lower classification error, so this is what our greedy algorithm will do first. This is the first feature to split on. And that would be the winner of our selection process. So in general, the decision tree splitting process will say given the subset of data at node M, which is what we're looking at, the root node so far, we try out every feature xi, which in our case here was credit, term, and income. And I could see splitting the data according to possible values of each one of these features, and I compute a classification error of the resulting split, just like we did manually over here. And then I pick the feature, which in our case was credit which had the lowest classification error. So, if we go back to our decision tree learning algorithm that first challenge that we had, figuring out what feature to split on, we can now do that using this feature split selection algorithm that minimizes the classification error. So next, we'll explore the other parts of this decision tree learning algorithm. [MUSIC]