[MUSIC] So far in this module, we've discussed learning decision tree, but we only used what's called categorical data inputs or features. So, we looked at, credit could be poor, fair, or excellent. However, if you look at income, that's what's called a real value feature. It has continuous possible values. So 105,000, 73,000, 69,000, and so on. So the question is how do you build a decision tree with this kind of input? One natural approach is to just treat income or the continuous valued feature by of the categorical data. So let's take that root nulled with 40 datapoints and just split on income. And see what happens. Well, there is one datapoint with income of $30,000. There's one datapoint income of $31,400. There's one datapoint with income $39,500 and so on. And it turns out that the nodes that we get out of them basically only have one datapoint in them. And this can be really, really bad. When you have very few data points in the intermediate node in the decision tree you're very prone to overfitting. Very prone to make predictions you cannot trust. So, for example, if you look here you'd predict in this case that if you're income is $30,000, this is definitely a risky loan, but if you're income is $31,400 is definitely a safe loan, however if your income is now 39,500, you're back to risky. So [LAUGH] it's risky, safe, risky, which doesn't make any sense. Do you trust it? I wouldn't. And so the question is, how do we deal with this real valued features. As a very natural alternative, we can work on threshold splits. And these are simply picking a threshold on the value of that continuous valued feature, so let's say 60,000. And for the left side of that split will put all the data points have income lower than $60,000, and on the right, will put all the data points have incomes higher than or equal to $60,000. And as we can see, we have a subset of the data here, income higher than $60,000. And for those we have many data points there. So, it's a lot less risk of over fitting and we see that 14 of them have our safe laws so probably predict a safe there. Well, 13 will risk you on the $60,000 so maybe you'll predict those as risks. So this is a very natural kind of split that we might want to do with continuous value data. Let's now take a moment to visualize what happens when we do this kind of threshold split. So for example, I've laid out my data income into this line here that ranges from 10,000 to 120, and if we pick a threshold split of 60,000 and we say everything on the left of the split has income less that $60,000 we're going to predict to be risky loans. Everything to the right has income higher than $60,000 we're going to predict those as being safe loans. Now let's supposed that we have two continuous value to features. We have income in the y axis and we have age in the x axis, and let's see what happens here. And you'll see there are some positive and negative examples laid out in 2D. Another thing that's interesting is that you see that older people with higher incomes tend to be safe loans, but also younger people that may have lower incomes, those might also be safe loans because those people may make money over time, let's say. So we might look at this state and decide to split on age first. And if we split on age, let's say age equals 38, we'll see that for the folks that are younger than 38, on average, more of them have risky long, so you might predict risky. But for the folks that have age greater than 38, we have more safe loans than risky. So we might predict safe. Now to the next split in our decision tree. We might choose to split for the folks that have age greater than 38 we might split on the income and ask whether this income greater than $60,000 or not. And if it is, we put a split there. And we'll see that the point below Income below $60,000 even the higher age might be negative, so might be predicted negative. So let's take a moment to visualize the decision tree we've learned so far. So we start from the root node over here and we made our first split. And for our first split, we decide to split on age. And the two possibilities we looked at were, is the age smaller than 38 or is the age greater than or equal to 38. So that was our first threshold split. And for those with age smaller than 38, let's say that we stopped right here, we'd see that there's five risky and three safe. So we'd predict risky. So that might be our leaf here. And for age greater than 38 we took another split, which was on income. And we just ask ourselves is the income Is it less than 60,000 or is it greater than or equal to 60,000? Now for the ones that have income greater than or equal to 60,000 in age greater than 38 we predicted those were safe loans. While the ones that had age greater than 38 and income less than $60,000, we predicted those to be risky loans. And this is an example for the tree where we're making these binary splits on the data for the continuous variables. [MUSIC]