1
00:00:00,000 --> 00:00:04,582
[MUSIC]

2
00:00:04,582 --> 00:00:08,080
We'll close off this module by
exploring a little bit more detail.

3
00:00:08,080 --> 00:00:11,100
I'll try example of decision trees and

4
00:00:11,100 --> 00:00:15,210
comparing it to our example of
logistic regression from before.

5
00:00:16,350 --> 00:00:18,190
Here is the example that we will consider,

6
00:00:18,190 --> 00:00:21,870
is one that we considered in
the logistic regression module.

7
00:00:21,870 --> 00:00:27,000
We're taking this dataset here and fitting

8
00:00:27,000 --> 00:00:32,980
a logistic regression,
decision boundary is on the right here.

9
00:00:32,980 --> 00:00:36,610
And we can see the parameters
that we learned from data.

10
00:00:36,610 --> 00:00:40,890
So this is if we just did
a simple degree one polynomial

11
00:00:40,890 --> 00:00:43,390
feature is just going to
straight up linear features.

12
00:00:43,390 --> 00:00:47,230
Let's see what happens when we build
a decision stump on the same data.

13
00:00:47,230 --> 00:00:52,625
So If you do a single level decision
tree and you apply to this data set,

14
00:00:52,625 --> 00:00:57,746
what we'll do is we'll split on x1 and
if you try out the threshold

15
00:00:57,746 --> 00:01:03,875
values you'll see that on the left
side we'll have x1 less than- 0.07 and

16
00:01:03,875 --> 00:01:08,113
on the right side will be
x1 greater than- 0.07.

17
00:01:08,113 --> 00:01:11,600
You see for the ones on the right.

18
00:01:11,600 --> 00:01:13,640
We have more positive examples so

19
00:01:13,640 --> 00:01:16,920
we're going to predict positive,
while the ones on the left,

20
00:01:16,920 --> 00:01:20,850
we have more negative examples, so
we're going to predict negative.

21
00:01:20,850 --> 00:01:22,640
And you'll see that they correspond.

22
00:01:22,640 --> 00:01:24,540
So here's our split.

23
00:01:24,540 --> 00:01:26,480
- 0.07.

24
00:01:27,928 --> 00:01:32,980
This first left on the tree will
correspond to points on the left there,

25
00:01:32,980 --> 00:01:35,490
and the second one will correspond
to points on the right.

26
00:01:35,490 --> 00:01:36,700
Extremely twittered.

27
00:01:36,700 --> 00:01:41,420
So unlike, rational we're able to
get the diagonal decision boundary.

28
00:01:41,420 --> 00:01:46,160
Here we're only going to get straight up,
or straight across, decision boundary.

29
00:01:46,160 --> 00:01:47,070
At least in the first split.

30
00:01:48,130 --> 00:01:51,850
Now, if we keep going with the degree of
the algorithm, and we take another split

31
00:01:51,850 --> 00:01:56,280
for each one of these intermediate nodes,
we'll see that for

32
00:01:56,280 --> 00:02:03,310
the data where x1 is less than- 0.07,
we might split on x1 again,

33
00:02:03,310 --> 00:02:09,006
and make predictions which in this
case would be splitting on x1,

34
00:02:09,006 --> 00:02:15,190
less than- 1.66, and on both cases,

35
00:02:15,190 --> 00:02:19,510
we're predicting to be- 1,

36
00:02:19,510 --> 00:02:23,810
so a negative data point, so- 1,- 1.

37
00:02:23,810 --> 00:02:28,240
But for x1 greater than- 0.07,

38
00:02:28,240 --> 00:02:34,360
we now split on x2,
whether it's bigger than or

39
00:02:34,360 --> 00:02:39,710
greater than 1.55, which in our split
on our data is over here, 1.55.

40
00:02:39,710 --> 00:02:41,950
That's where x2 would be.

41
00:02:41,950 --> 00:02:47,160
And we see that now we have, for the data
which is smaller than 1.55, in x2,

42
00:02:47,160 --> 00:02:52,090
we have 11 positive examples,
so we predict + 1.

43
00:02:52,090 --> 00:02:54,750
But for
the data which is greater than 1.55,

44
00:02:54,750 --> 00:02:59,110
we have three negative examples,
so we predict- 1.

45
00:02:59,110 --> 00:03:03,660
And so now we have a much
more interesting region so

46
00:03:05,560 --> 00:03:12,360
the split, these two nodes here on
the right, one corresponds to this

47
00:03:12,360 --> 00:03:17,530
big green area and the other one
corresponds to this little box on the top.

48
00:03:18,880 --> 00:03:23,520
And we can imagine continuing
this process, splitting again and

49
00:03:23,520 --> 00:03:28,140
again and again as we'll see next,
but one important thing to know

50
00:03:28,140 --> 00:03:32,645
is that when you have continuous variables
and And this is a really important point.

51
00:03:32,645 --> 00:03:34,125
Unlike for discrete ones or

52
00:03:34,125 --> 00:03:38,635
for categorical ones, we can split
on the same variable multiple times.

53
00:03:38,635 --> 00:03:42,035
So we can split the next one and split the
next one again, or we can split the next

54
00:03:42,035 --> 00:03:45,172
one and the next two, then the next
one and again, and we'll see that.

55
00:03:46,472 --> 00:03:48,242
So in this example that we talked about,

56
00:03:48,242 --> 00:03:50,792
we can keep the decision tree
learning process growing.

57
00:03:50,792 --> 00:03:55,662
So in Depth 1 we just get a decision
stump which corresponds to the vertical

58
00:03:55,662 --> 00:04:00,032
line that we drew in the beginning- 0.07.

59
00:04:00,032 --> 00:04:04,962
If we go to Depth 2, we get this little
box that contains most of the positive

60
00:04:04,962 --> 00:04:09,910
examples but it is still some
misclassifications over here, and then

61
00:04:09,910 --> 00:04:12,440
if you kept going, splitting, splitting,
splitting, splitting, splitting,

62
00:04:12,440 --> 00:04:17,640
splitting all the way to Depth 10, we
get this really crazy decision boundary.

63
00:04:17,640 --> 00:04:23,360
And if you look at it carefully,
what you'll see is that, and I'm going to

64
00:04:23,360 --> 00:04:28,250
draw over the decision boundary here, but
if you look at it a little bit carefully,

65
00:04:28,250 --> 00:04:34,710
you'll see that it basically makes,
no mistakes, so it has 0 training error.

66
00:04:34,710 --> 00:04:37,470
And we can compare what we saw
with Logistic Regression with

67
00:04:37,470 --> 00:04:41,140
what we're seeing with Decision Trees,
and understand again, in preview for

68
00:04:41,140 --> 00:04:44,110
what we will see next module,
kind of the notion of overfitting.

69
00:04:44,110 --> 00:04:47,830
So, in Logistic Regression we started
with Degree 1 parameter features, and

70
00:04:47,830 --> 00:04:52,500
we saw in Degree 2 had a really nice
fit of the data, is a nice parable.

71
00:04:52,500 --> 00:04:55,110
You didn't get everything right,
but you did pretty well.

72
00:04:55,110 --> 00:05:00,820
And the degree six polynomial had
a really crazy decision boundary.

73
00:05:00,820 --> 00:05:04,860
It got zero training error, but
I didn't really trust those predictions,

74
00:05:04,860 --> 00:05:06,160
we didn't really trust those prediction.

75
00:05:06,160 --> 00:05:10,330
With the decision tree, what you control
is the depth of the decision tree and

76
00:05:10,330 --> 00:05:13,180
so Depth 1 was just a decision stamp.

77
00:05:13,180 --> 00:05:14,440
It didn't do so well.

78
00:05:14,440 --> 00:05:18,240
If you go to Depth 3, it looks like
a little bit of a jagged line, but

79
00:05:18,240 --> 00:05:20,970
it looks like a pretty
nice decision boundary.

80
00:05:20,970 --> 00:05:22,860
It makes a few mistakes,
but it looks pretty good.

81
00:05:22,860 --> 00:05:27,819
If you look at Depth 10,
you get this crazy decision boundary,

82
00:05:27,819 --> 00:05:32,344
has zero training error, but
is likely to be over fitting.

83
00:05:32,344 --> 00:05:36,129
[MUSIC]