1 00:00:00,000 --> 00:00:04,680 [MUSIC] 2 00:00:04,680 --> 00:00:08,562 In this module we're going to talk about the specific boosting algorithm called 3 00:00:08,562 --> 00:00:09,710 AdaBoost. 4 00:00:09,710 --> 00:00:13,340 AdaBoost is one of the early machine learning algorithms for boosting. 5 00:00:13,340 --> 00:00:14,490 It's extremely useful. 6 00:00:14,490 --> 00:00:16,710 It's very simple to implement. 7 00:00:16,710 --> 00:00:18,400 And it's quite effective. 8 00:00:18,400 --> 00:00:18,940 There are others. 9 00:00:18,940 --> 00:00:23,087 I'll mention another interesting one towards the end of the module but 10 00:00:23,087 --> 00:00:24,825 let's start with AdaBoost. 11 00:00:26,040 --> 00:00:29,332 This is the famous AdaBoost algorithm, 12 00:00:29,332 --> 00:00:35,690 which was created by Freund and Schapire in 1999, amazing useful algorithm. 13 00:00:35,690 --> 00:00:38,280 So you start by seeing every data point the same way. 14 00:00:38,280 --> 00:00:40,040 You don't know which ones are hard, which ones are easy. 15 00:00:40,040 --> 00:00:41,310 They all have the same weight. 16 00:00:41,310 --> 00:00:46,500 And so you can start with them all weight, we'll start with them with weight 1 over 17 00:00:46,500 --> 00:00:51,480 N, because it makes everything kind of work out a little better. 18 00:00:51,480 --> 00:00:54,995 And we'll talk about why in a few slides, but 19 00:00:54,995 --> 00:00:58,745 we start to all data points having the same what's called uniform weight. 20 00:00:58,745 --> 00:01:01,455 So in this case alpha i is one over n. 21 00:01:01,455 --> 00:01:06,595 And then for each iteration of AdaBoost, as you go to learn the first 22 00:01:07,705 --> 00:01:10,555 decision stamp or the first simple classifier or the first weight classifier, 23 00:01:10,555 --> 00:01:12,595 the second one, or the third one all the way to capital T. 24 00:01:12,595 --> 00:01:17,665 What we do is we learn ft on weighted data is alpha i. 25 00:01:19,030 --> 00:01:22,775 So that's the data, is the weights start with 1 over N but 26 00:01:22,775 --> 00:01:24,771 they get different over time. 27 00:01:24,771 --> 00:01:31,132 Then we compute the coefficient w hat t for 28 00:01:31,132 --> 00:01:37,720 this new classify ft that we learned. 29 00:01:37,720 --> 00:01:40,770 And then we recompute the weights alpha i. 30 00:01:42,070 --> 00:01:47,213 And we keep going, and finally once we're done we say that the prediction y hat is 31 00:01:47,213 --> 00:01:52,860 the sign of the weighted combination of f1, f2, f3, 32 00:01:52,860 --> 00:01:57,844 f4 weighted by these coefficients that we learn from later. 33 00:01:59,210 --> 00:02:02,730 So there are two fundamental problems we need to address 34 00:02:02,730 --> 00:02:05,240 when we're thinking about AdaBoost. 35 00:02:05,240 --> 00:02:10,616 One is, how do you compute the coefficient w hat t, 36 00:02:10,616 --> 00:02:16,125 let's call that problem 1 in our module for today. 37 00:02:16,125 --> 00:02:22,330 So problem 1 is how much do I trust ftF in this case? 38 00:02:23,840 --> 00:02:26,830 So if I trust ft a lot, I should give it a very high weight. 39 00:02:26,830 --> 00:02:29,720 If I trust ft very little, 40 00:02:29,720 --> 00:02:34,220 I should give it a very low weight, or a very low coefficient, I should say. 41 00:02:34,220 --> 00:02:39,850 And then problem 2 is how do you recompute this weight on the data points? 42 00:02:39,850 --> 00:02:41,311 Let's call that problem 2. 43 00:02:41,311 --> 00:02:46,891 And so, problem 2 here is how 44 00:02:46,891 --> 00:02:52,031 do we weigh mistakes more? 45 00:02:52,031 --> 00:02:54,291 So we want to increase the weights of mistakes. 46 00:02:54,291 --> 00:02:58,669 So in the main part of this module, we're going to talk about how do 47 00:02:58,669 --> 00:03:02,398 you compute w hat t and how we can update alpha i's, and 48 00:03:02,398 --> 00:03:07,610 it's going to be pretty simple, relatively intuitive, extremely useful. 49 00:03:07,610 --> 00:03:11,869 [MUSIC]