1 00:00:00,000 --> 00:00:04,582 [MUSIC] 2 00:00:04,582 --> 00:00:08,080 We'll close off this module by exploring a little bit more detail. 3 00:00:08,080 --> 00:00:11,100 I'll try example of decision trees and 4 00:00:11,100 --> 00:00:15,210 comparing it to our example of logistic regression from before. 5 00:00:16,350 --> 00:00:18,190 Here is the example that we will consider, 6 00:00:18,190 --> 00:00:21,870 is one that we considered in the logistic regression module. 7 00:00:21,870 --> 00:00:27,000 We're taking this dataset here and fitting 8 00:00:27,000 --> 00:00:32,980 a logistic regression, decision boundary is on the right here. 9 00:00:32,980 --> 00:00:36,610 And we can see the parameters that we learned from data. 10 00:00:36,610 --> 00:00:40,890 So this is if we just did a simple degree one polynomial 11 00:00:40,890 --> 00:00:43,390 feature is just going to straight up linear features. 12 00:00:43,390 --> 00:00:47,230 Let's see what happens when we build a decision stump on the same data. 13 00:00:47,230 --> 00:00:52,625 So If you do a single level decision tree and you apply to this data set, 14 00:00:52,625 --> 00:00:57,746 what we'll do is we'll split on x1 and if you try out the threshold 15 00:00:57,746 --> 00:01:03,875 values you'll see that on the left side we'll have x1 less than- 0.07 and 16 00:01:03,875 --> 00:01:08,113 on the right side will be x1 greater than- 0.07. 17 00:01:08,113 --> 00:01:11,600 You see for the ones on the right. 18 00:01:11,600 --> 00:01:13,640 We have more positive examples so 19 00:01:13,640 --> 00:01:16,920 we're going to predict positive, while the ones on the left, 20 00:01:16,920 --> 00:01:20,850 we have more negative examples, so we're going to predict negative. 21 00:01:20,850 --> 00:01:22,640 And you'll see that they correspond. 22 00:01:22,640 --> 00:01:24,540 So here's our split. 23 00:01:24,540 --> 00:01:26,480 - 0.07. 24 00:01:27,928 --> 00:01:32,980 This first left on the tree will correspond to points on the left there, 25 00:01:32,980 --> 00:01:35,490 and the second one will correspond to points on the right. 26 00:01:35,490 --> 00:01:36,700 Extremely twittered. 27 00:01:36,700 --> 00:01:41,420 So unlike, rational we're able to get the diagonal decision boundary. 28 00:01:41,420 --> 00:01:46,160 Here we're only going to get straight up, or straight across, decision boundary. 29 00:01:46,160 --> 00:01:47,070 At least in the first split. 30 00:01:48,130 --> 00:01:51,850 Now, if we keep going with the degree of the algorithm, and we take another split 31 00:01:51,850 --> 00:01:56,280 for each one of these intermediate nodes, we'll see that for 32 00:01:56,280 --> 00:02:03,310 the data where x1 is less than- 0.07, we might split on x1 again, 33 00:02:03,310 --> 00:02:09,006 and make predictions which in this case would be splitting on x1, 34 00:02:09,006 --> 00:02:15,190 less than- 1.66, and on both cases, 35 00:02:15,190 --> 00:02:19,510 we're predicting to be- 1, 36 00:02:19,510 --> 00:02:23,810 so a negative data point, so- 1,- 1. 37 00:02:23,810 --> 00:02:28,240 But for x1 greater than- 0.07, 38 00:02:28,240 --> 00:02:34,360 we now split on x2, whether it's bigger than or 39 00:02:34,360 --> 00:02:39,710 greater than 1.55, which in our split on our data is over here, 1.55. 40 00:02:39,710 --> 00:02:41,950 That's where x2 would be. 41 00:02:41,950 --> 00:02:47,160 And we see that now we have, for the data which is smaller than 1.55, in x2, 42 00:02:47,160 --> 00:02:52,090 we have 11 positive examples, so we predict + 1. 43 00:02:52,090 --> 00:02:54,750 But for the data which is greater than 1.55, 44 00:02:54,750 --> 00:02:59,110 we have three negative examples, so we predict- 1. 45 00:02:59,110 --> 00:03:03,660 And so now we have a much more interesting region so 46 00:03:05,560 --> 00:03:12,360 the split, these two nodes here on the right, one corresponds to this 47 00:03:12,360 --> 00:03:17,530 big green area and the other one corresponds to this little box on the top. 48 00:03:18,880 --> 00:03:23,520 And we can imagine continuing this process, splitting again and 49 00:03:23,520 --> 00:03:28,140 again and again as we'll see next, but one important thing to know 50 00:03:28,140 --> 00:03:32,645 is that when you have continuous variables and And this is a really important point. 51 00:03:32,645 --> 00:03:34,125 Unlike for discrete ones or 52 00:03:34,125 --> 00:03:38,635 for categorical ones, we can split on the same variable multiple times. 53 00:03:38,635 --> 00:03:42,035 So we can split the next one and split the next one again, or we can split the next 54 00:03:42,035 --> 00:03:45,172 one and the next two, then the next one and again, and we'll see that. 55 00:03:46,472 --> 00:03:48,242 So in this example that we talked about, 56 00:03:48,242 --> 00:03:50,792 we can keep the decision tree learning process growing. 57 00:03:50,792 --> 00:03:55,662 So in Depth 1 we just get a decision stump which corresponds to the vertical 58 00:03:55,662 --> 00:04:00,032 line that we drew in the beginning- 0.07. 59 00:04:00,032 --> 00:04:04,962 If we go to Depth 2, we get this little box that contains most of the positive 60 00:04:04,962 --> 00:04:09,910 examples but it is still some misclassifications over here, and then 61 00:04:09,910 --> 00:04:12,440 if you kept going, splitting, splitting, splitting, splitting, splitting, 62 00:04:12,440 --> 00:04:17,640 splitting all the way to Depth 10, we get this really crazy decision boundary. 63 00:04:17,640 --> 00:04:23,360 And if you look at it carefully, what you'll see is that, and I'm going to 64 00:04:23,360 --> 00:04:28,250 draw over the decision boundary here, but if you look at it a little bit carefully, 65 00:04:28,250 --> 00:04:34,710 you'll see that it basically makes, no mistakes, so it has 0 training error. 66 00:04:34,710 --> 00:04:37,470 And we can compare what we saw with Logistic Regression with 67 00:04:37,470 --> 00:04:41,140 what we're seeing with Decision Trees, and understand again, in preview for 68 00:04:41,140 --> 00:04:44,110 what we will see next module, kind of the notion of overfitting. 69 00:04:44,110 --> 00:04:47,830 So, in Logistic Regression we started with Degree 1 parameter features, and 70 00:04:47,830 --> 00:04:52,500 we saw in Degree 2 had a really nice fit of the data, is a nice parable. 71 00:04:52,500 --> 00:04:55,110 You didn't get everything right, but you did pretty well. 72 00:04:55,110 --> 00:05:00,820 And the degree six polynomial had a really crazy decision boundary. 73 00:05:00,820 --> 00:05:04,860 It got zero training error, but I didn't really trust those predictions, 74 00:05:04,860 --> 00:05:06,160 we didn't really trust those prediction. 75 00:05:06,160 --> 00:05:10,330 With the decision tree, what you control is the depth of the decision tree and 76 00:05:10,330 --> 00:05:13,180 so Depth 1 was just a decision stamp. 77 00:05:13,180 --> 00:05:14,440 It didn't do so well. 78 00:05:14,440 --> 00:05:18,240 If you go to Depth 3, it looks like a little bit of a jagged line, but 79 00:05:18,240 --> 00:05:20,970 it looks like a pretty nice decision boundary. 80 00:05:20,970 --> 00:05:22,860 It makes a few mistakes, but it looks pretty good. 81 00:05:22,860 --> 00:05:27,819 If you look at Depth 10, you get this crazy decision boundary, 82 00:05:27,819 --> 00:05:32,344 has zero training error, but is likely to be over fitting. 83 00:05:32,344 --> 00:05:36,129 [MUSIC]