[MUSIC] We've now outlined the greedy algorithm for learning and decision tree. The first thing we're going to explore is this idea of picking what feature to split on next. We split in our example, credit first, and if we can split some different feature, how do we decide what to do? And it turns out that this feature selection problem, this feature splitting learning problem, can be viewed as the problem of learning what's called a decision stump, which is that one level on the decision tree. For those not familiar with it, a tree, it's kind of this really big thing, but if you cut it, you're only left with the little bit at the bottom. And that thing is called the stump. So its a really, really, really short piece of a tree. So how do you run a decision stump or a level 1 decision tree from data. So we are given a data set like this just like we had before and our goal here is to learn a 1 level decision tree. So were given the top node or the root node of the data so we have a point to safe some of the data points are risky. There's 40 examples, in our case, and it turns out that 22 of those are safe loans, and 18 are risky loans. That's what our data set looks like. Now, I have a histogram but as we're building this figures they can get really big and complicated. So I'm going to compress it a little bit. Instead of showing the histogram and the numbers 22 and 18, I'm just going to show the numbers 22 and 18 to simplify the visualization. So now when you see that root node, you should interpret it as we have 40 data points. 22 are green which is safe loans, 18 are orange which are risky loans. And starting from there, how do we go and build that decision stump? So in our case, we had all the data. We split on credit, and we decided that some subset of data had excellent credit, some had fair, and some had poor. So we assign each one of those subsets the subsequent node. In our new kind of visualization notation, we have the original root node with all the data, 22 risky and 18, sorry, 22 safe and 18 risky. For X and credit we have some sort of data where 9 of them were safe and 0 were risky. So, 9 in green, 0 in orange. For fair credit we have 9 safe, 4 risky. And for poor credit we have 4 safe, and 14 risky. So that's what it later looks like at the next level, after we've done the splits. These nodes here in the middle we call intermediate nodes. Now, for each intermediate node we can try to make a prediction in decision stump. So for example for poor credit we see the majority of the data in there has risky associated with it. So we predicate that'll be a risky loan. For fair credit, we see that the majority 9 versus 4 have, are safe loans. So we predict that to be a safe loan. And for excellent credit, we predict that to be a safe loan, because 9 versus 0 So nine safe loans in there. So for each node, we look at the majority value to make a prediction. And you've now learned your first decision stump. It's a pretty simple one, but to get better predictions and more accuracy, we're going to explore that more and split further. But before we split further, we're going to discuss why we picked credit to do the first split as opposed to say, for example, the term of the loan or income. [MUSIC]