1
00:00:00,066 --> 00:00:04,859
[MUSIC]

2
00:00:04,859 --> 00:00:08,387
We'll now take a few minutes to
instantiate this abstract algorithm we

3
00:00:08,387 --> 00:00:12,650
described and see what it looks like in
the context of learning decision stumps.

4
00:00:12,650 --> 00:00:13,980
There is other boosted decision stumps,

5
00:00:13,980 --> 00:00:17,810
it's a really nice simple default
way of training on your data.

6
00:00:17,810 --> 00:00:22,320
And so, it's one that we'll just
go through a little further and

7
00:00:22,320 --> 00:00:24,890
it'll help us ground some of
the concepts that we looked at so far.

8
00:00:26,260 --> 00:00:29,400
So here I've outlined the other boost
algorithm that we discussed before in

9
00:00:29,400 --> 00:00:31,020
the previous slides, but

10
00:00:31,020 --> 00:00:35,850
just to be clear we're going to talk about
linear decision stump for f of t, and

11
00:00:35,850 --> 00:00:40,070
figuring out how to update it,
figure out its coefficient, w hat t.

12
00:00:41,280 --> 00:00:43,510
Our first step is figuring out how
to learn the next decision stump,

13
00:00:43,510 --> 00:00:44,470
that's going to be f of t.

14
00:00:44,470 --> 00:00:47,570
And this is just going to be like
standard decision stump learning.

15
00:00:47,570 --> 00:00:52,040
We're going to try splitting it on
each feature, income, credit history,

16
00:00:52,040 --> 00:00:54,710
savings, market conditions.

17
00:00:54,710 --> 00:00:57,570
And figure out how well each of
the resulting decisions stumps

18
00:00:57,570 --> 00:00:59,460
are a way to data.

19
00:00:59,460 --> 00:01:04,360
And notice that in our process we
might split on income multiple times.

20
00:01:04,360 --> 00:01:07,430
So in multiple considerations we
might revisit the same feature.

21
00:01:07,430 --> 00:01:09,280
So we're going to try each
of those features, and for

22
00:01:09,280 --> 00:01:13,370
each one of them,
measure the weighted error on string data.

23
00:01:13,370 --> 00:01:16,840
So for say splitting on income,
the weight of the error might be 0.2.

24
00:01:16,840 --> 00:01:18,984
For splitting on credit, it might be 0.35.

25
00:01:18,984 --> 00:01:21,490
For splitting on savings, it might be 0.3.

26
00:01:21,490 --> 00:01:24,370
And finally,
if you split the market conditions,

27
00:01:24,370 --> 00:01:26,160
it might be the worst of
these four decision stumps.

28
00:01:26,160 --> 00:01:30,194
On this weighted data,
it might have a weighted error of 0.4.

29
00:01:31,590 --> 00:01:36,670
So, we're picking the best feature, the
one that has lowest weighted error, and

30
00:01:36,670 --> 00:01:38,680
so we're going to pick
that first one split on.

31
00:01:38,680 --> 00:01:40,900
We're going to split on income,
we'll get to the 100,000.

32
00:01:40,900 --> 00:01:43,520
And so f of t's,
going to be that decision stamp,

33
00:01:43,520 --> 00:01:49,160
that says income's good to 100,000,
if yes, safe, if not, is a risky loan.

34
00:01:50,260 --> 00:01:51,520
Now, the final question is,

35
00:01:51,520 --> 00:01:54,730
what coefficient we give to
this particular classifier?

36
00:01:54,730 --> 00:01:59,520
So all we have to is plug 0.2 into
the formula, and if we plug it in and

37
00:01:59,520 --> 00:02:02,070
do the math, 0.69 is the result.

38
00:02:02,070 --> 00:02:06,194
So the coefficient of this first
decision stem is just going to be 0.69.

39
00:02:08,170 --> 00:02:12,690
Going back to the algorithm,
we discussed how we are going to learn

40
00:02:12,690 --> 00:02:15,270
this new stuff from data and
how we figure out its coefficient.

41
00:02:15,270 --> 00:02:20,320
Let's next talk about how to update
the weight of i of each data point.

42
00:02:20,320 --> 00:02:24,140
So here's the intuitive
process what happens.

43
00:02:24,140 --> 00:02:26,850
We have our data points and

44
00:02:26,850 --> 00:02:31,120
I'm highlighting them here depending on
their income just like we did before.

45
00:02:31,120 --> 00:02:33,780
But I'm going to make a prediction
using this decision stamp.

46
00:02:33,780 --> 00:02:39,040
The question is how good is this decision
stamp income greater than 100,000, and if

47
00:02:39,040 --> 00:02:42,960
you look at it, it makes mistakes in some
of the data points and get others right.

48
00:02:42,960 --> 00:02:47,220
So I marked the correct ones in bright
green and mistakes in bright red.

49
00:02:48,410 --> 00:02:51,494
And if we take the previous weights,
alpha for each one of these data points,

50
00:02:51,494 --> 00:02:54,730
I'm going to highlight where
those weights were right there.

51
00:02:54,730 --> 00:02:57,650
We need to compute the new weight
based on the formula above,

52
00:02:57,650 --> 00:03:00,260
which is standard formula.

53
00:03:00,260 --> 00:03:03,565
So, we're going to plug in the w
hat that we computed, 0.69,

54
00:03:03,565 --> 00:03:08,410
into the formula to figure out what to
multiply each one of those weights by.

55
00:03:08,410 --> 00:03:09,820
So, plug it in.

56
00:03:09,820 --> 00:03:14,850
You'll see that each of
the -0.69 is a half, so

57
00:03:14,850 --> 00:03:20,160
for every correct data point one is half
its weight, and each of the 0.69 is two.

58
00:03:20,160 --> 00:03:23,250
So for every incorrect data point,
we're going to double its weight.

59
00:03:23,250 --> 00:03:24,810
So I'm going to go row by row, for

60
00:03:24,810 --> 00:03:28,030
the ones in green I got correct,
I'm going to half the weights.

61
00:03:28,030 --> 00:03:31,600
So for the first row there the weight
before was 0.5 and now becomes 0.25.

62
00:03:31,600 --> 00:03:35,560
The next one was 1.5 becomes
0.75 because of their correct.

63
00:03:35,560 --> 00:03:39,060
For third row I made a mistake,
its weight before was 1.5.

64
00:03:39,060 --> 00:03:41,220
Now I'm going to double it,
I'm going to make it 3.

65
00:03:41,220 --> 00:03:46,430
So it can go datapoint by datapoint and
then multiplied by two, or divided by two,

66
00:03:46,430 --> 00:03:50,600
the weight, depending on whether we
got that data point right or not.

67
00:03:50,600 --> 00:03:54,491
Extremely simple to be able to boost
the decision stump classifier, and

68
00:03:54,491 --> 00:03:57,566
these tend do extremely well
on a wide range of data sets.

69
00:03:57,566 --> 00:04:02,069
[MUSIC]