1
00:00:00,000 --> 00:00:04,680
[MUSIC]

2
00:00:04,680 --> 00:00:08,562
In this module we're going to talk about
the specific boosting algorithm called

3
00:00:08,562 --> 00:00:09,710
AdaBoost.

4
00:00:09,710 --> 00:00:13,340
AdaBoost is one of the early machine
learning algorithms for boosting.

5
00:00:13,340 --> 00:00:14,490
It's extremely useful.

6
00:00:14,490 --> 00:00:16,710
It's very simple to implement.

7
00:00:16,710 --> 00:00:18,400
And it's quite effective.

8
00:00:18,400 --> 00:00:18,940
There are others.

9
00:00:18,940 --> 00:00:23,087
I'll mention another interesting one
towards the end of the module but

10
00:00:23,087 --> 00:00:24,825
let's start with AdaBoost.

11
00:00:26,040 --> 00:00:29,332
This is the famous AdaBoost algorithm,

12
00:00:29,332 --> 00:00:35,690
which was created by Freund and Schapire
in 1999, amazing useful algorithm.

13
00:00:35,690 --> 00:00:38,280
So you start by seeing every
data point the same way.

14
00:00:38,280 --> 00:00:40,040
You don't know which ones are hard,
which ones are easy.

15
00:00:40,040 --> 00:00:41,310
They all have the same weight.

16
00:00:41,310 --> 00:00:46,500
And so you can start with them all weight,
we'll start with them with weight 1 over

17
00:00:46,500 --> 00:00:51,480
N, because it makes everything
kind of work out a little better.

18
00:00:51,480 --> 00:00:54,995
And we'll talk about why in a few slides,
but

19
00:00:54,995 --> 00:00:58,745
we start to all data points having
the same what's called uniform weight.

20
00:00:58,745 --> 00:01:01,455
So in this case alpha i is one over n.

21
00:01:01,455 --> 00:01:06,595
And then for each iteration of AdaBoost,
as you go to learn the first

22
00:01:07,705 --> 00:01:10,555
decision stamp or the first simple
classifier or the first weight classifier,

23
00:01:10,555 --> 00:01:12,595
the second one, or
the third one all the way to capital T.

24
00:01:12,595 --> 00:01:17,665
What we do is we learn ft on
weighted data is alpha i.

25
00:01:19,030 --> 00:01:22,775
So that's the data,
is the weights start with 1 over N but

26
00:01:22,775 --> 00:01:24,771
they get different over time.

27
00:01:24,771 --> 00:01:31,132
Then we compute
the coefficient w hat t for

28
00:01:31,132 --> 00:01:37,720
this new classify ft that we learned.

29
00:01:37,720 --> 00:01:40,770
And then we recompute the weights alpha i.

30
00:01:42,070 --> 00:01:47,213
And we keep going, and finally once we're
done we say that the prediction y hat is

31
00:01:47,213 --> 00:01:52,860
the sign of the weighted
combination of f1, f2, f3,

32
00:01:52,860 --> 00:01:57,844
f4 weighted by these coefficients
that we learn from later.

33
00:01:59,210 --> 00:02:02,730
So there are two fundamental
problems we need to address

34
00:02:02,730 --> 00:02:05,240
when we're thinking about AdaBoost.

35
00:02:05,240 --> 00:02:10,616
One is, how do you compute
the coefficient w hat t,

36
00:02:10,616 --> 00:02:16,125
let's call that problem 1
in our module for today.

37
00:02:16,125 --> 00:02:22,330
So problem 1 is how much do
I trust ftF in this case?

38
00:02:23,840 --> 00:02:26,830
So if I trust ft a lot,
I should give it a very high weight.

39
00:02:26,830 --> 00:02:29,720
If I trust ft very little,

40
00:02:29,720 --> 00:02:34,220
I should give it a very low weight, or
a very low coefficient, I should say.

41
00:02:34,220 --> 00:02:39,850
And then problem 2 is how do you
recompute this weight on the data points?

42
00:02:39,850 --> 00:02:41,311
Let's call that problem 2.

43
00:02:41,311 --> 00:02:46,891
And so, problem 2 here is how

44
00:02:46,891 --> 00:02:52,031
do we weigh mistakes more?

45
00:02:52,031 --> 00:02:54,291
So we want to increase
the weights of mistakes.

46
00:02:54,291 --> 00:02:58,669
So in the main part of this module,
we're going to talk about how do

47
00:02:58,669 --> 00:03:02,398
you compute w hat t and
how we can update alpha i's, and

48
00:03:02,398 --> 00:03:07,610
it's going to be pretty simple,
relatively intuitive, extremely useful.

49
00:03:07,610 --> 00:03:11,869
[MUSIC]