1 00:00:00,025 --> 00:00:04,892 [SOUND] We started with alpha i's being uniform, the same for 2 00:00:04,892 --> 00:00:10,237 all data points, one over n, and now we want to change them to focus 3 00:00:10,237 --> 00:00:15,694 more on those difficult data points where we're making mistakes. 4 00:00:15,694 --> 00:00:18,583 So the question is where did f t make mistakes or 5 00:00:18,583 --> 00:00:20,850 what data points f t got right. 6 00:00:20,850 --> 00:00:22,355 If f t got a particular data point, 7 00:00:22,355 --> 00:00:26,966 xi right, we want to decrease alpha i because we got it right. 8 00:00:26,966 --> 00:00:32,250 But if we got xi wrong, then want to increase our phi so the next 9 00:00:32,250 --> 00:00:37,400 decision style we classify our homes in and does better in this particular input. 10 00:00:39,100 --> 00:00:44,020 Again, the AdaBoost theorem provides us with a slightly intimidating formula for 11 00:00:44,020 --> 00:00:46,100 how to update the weights out for i. 12 00:00:46,100 --> 00:00:49,640 But if you take a moment to interpret it, we'll see this one is extremely intuitive, 13 00:00:49,640 --> 00:00:51,980 and there's something quite nice. 14 00:00:51,980 --> 00:00:53,560 So let's take a quick look at it. 15 00:00:53,560 --> 00:00:59,600 So it says that alpha i gets an update depending on whether on ft 16 00:00:59,600 --> 00:01:05,760 gets the data point right because this is correct or whether ft makes a mistake. 17 00:01:07,140 --> 00:01:10,040 In this we'll see we're going to increase the weight of data points where we 18 00:01:10,040 --> 00:01:13,400 made mistakes and we're going to decrease the weight of data points we got right. 19 00:01:13,400 --> 00:01:14,780 So let's take a look at this. 20 00:01:16,200 --> 00:01:19,590 So let's take one xi and lets suppose that we got it correct. 21 00:01:21,710 --> 00:01:26,510 So, we're the top line here and notice that this equation depends on 22 00:01:26,510 --> 00:01:30,100 whatever the coefficients that was assigned to this classifier. 23 00:01:30,100 --> 00:01:31,730 So, the classifier was good. 24 00:01:31,730 --> 00:01:33,170 We only changed a way to more but 25 00:01:33,170 --> 00:01:36,200 if the classifiers was terrible we're going to change the weights less. 26 00:01:36,200 --> 00:01:39,902 So, let's say the classifier was good and we gave it weight 2.3. 27 00:01:39,902 --> 00:01:43,310 So what we're doing here, we're looking at the formula, 28 00:01:43,310 --> 00:01:48,040 we're multiplying alpha i by e to the -w hat t, which is 2.3. 29 00:01:48,040 --> 00:01:53,310 And if you take your calculator out, you see that this is 0.1. 30 00:01:53,310 --> 00:01:56,395 So, we're taking the data points to our right, and 31 00:01:56,395 --> 00:02:01,040 we multiply the weight of those data points, by 0.1, so dividing by 10. 32 00:02:01,040 --> 00:02:02,180 So what effect does that do? 33 00:02:03,570 --> 00:02:07,945 We're going to decrease the importance of this data point 34 00:02:07,945 --> 00:02:14,870 ff xi, yi, so this particular data point. 35 00:02:16,180 --> 00:02:20,450 Now let's look at a case where we got the data point correct but 36 00:02:20,450 --> 00:02:23,480 the cost that we learn is random. 37 00:02:23,480 --> 00:02:27,860 So, it had to wait zero just like we discussed a few slides ago. 38 00:02:27,860 --> 00:02:32,144 So it's overall weight 0.5 is weight 0. 39 00:02:32,144 --> 00:02:37,928 In this case we're multiplying the coefficient L5, but e to the minus 0. 40 00:02:37,928 --> 00:02:40,920 Which is = 1, What does that mean? 41 00:02:40,920 --> 00:02:45,060 That means that when I keep the importance of this data point 42 00:02:45,060 --> 00:02:49,910 the same, that also makes a ton of sense. 43 00:02:49,910 --> 00:02:53,110 So this was a classified that was terrible and we gave it a weight of 0, 44 00:02:53,110 --> 00:02:56,830 we are going to ignore it, and so since we are ignoring it we are not changing 45 00:02:56,830 --> 00:03:00,770 anything about how we rate all data points we are just going to keep going as if 46 00:03:00,770 --> 00:03:03,680 nothing's happened because nothing's changed on their overall assemble. 47 00:03:04,970 --> 00:03:09,490 Now let's look at the opposite case when we actually made mistakes so 48 00:03:09,490 --> 00:03:14,980 let's say that we got xi incorrect so we made a mistake. 49 00:03:16,090 --> 00:03:18,820 In this case, we're in the second line here. 50 00:03:18,820 --> 00:03:23,587 So if it was a good classifier it had the w hat t of 2.3, 51 00:03:23,587 --> 00:03:29,269 then we're going to multiply the weight by each of the power 2.3, 52 00:03:29,269 --> 00:03:36,800 so this is e to the 2.3, which if you do the math is 9.98, so approximately 10. 53 00:03:36,800 --> 00:03:38,580 So it's 10 times bigger. 54 00:03:38,580 --> 00:03:44,700 And so, what we doing is increasing the importance of this mistake significantly. 55 00:03:47,400 --> 00:03:50,170 So the next classify is going to pay much more attention to this particular data 56 00:03:50,170 --> 00:03:51,800 point because it was a mistaken one. 57 00:03:53,570 --> 00:03:57,920 Now finally, just very quickly, what happens if we make a mistake, but 58 00:03:57,920 --> 00:04:01,960 we have that random classifier that had weight 0, we didn't care about it. 59 00:04:01,960 --> 00:04:04,422 So the multiplication here is e0, 60 00:04:04,422 --> 00:04:09,670 which is again = 1 which means we keep the importance of this data point the same. 61 00:04:12,360 --> 00:04:17,740 So very good we now seen this cool update from AdaBoost, 62 00:04:17,740 --> 00:04:21,310 which makes a ton of sense increase the weights of data points where we made 63 00:04:21,310 --> 00:04:24,740 mistakes and decrease the ones we didn't make mistakes in simulator and 64 00:04:24,740 --> 00:04:28,380 we're going to use it in our AdaBoost algorithm. 65 00:04:29,990 --> 00:04:32,300 So if we update our algorithm, or 66 00:04:32,300 --> 00:04:35,590 we stacked it with uniform weights, we learned classifier f of t. 67 00:04:35,590 --> 00:04:40,090 We updated its, or computed its coefficient, w hat t. 68 00:04:40,090 --> 00:04:43,160 Now we can update the weights of the data points, alpha i. 69 00:04:44,190 --> 00:04:48,029 Using the simple formula from the previous slide which increases the weight of 70 00:04:48,029 --> 00:04:51,414 mistakes, decreases the weights of the correct classifications. 71 00:04:51,414 --> 00:04:55,539 [MUSIC]