1 00:00:00,066 --> 00:00:04,859 [MUSIC] 2 00:00:04,859 --> 00:00:08,387 We'll now take a few minutes to instantiate this abstract algorithm we 3 00:00:08,387 --> 00:00:12,650 described and see what it looks like in the context of learning decision stumps. 4 00:00:12,650 --> 00:00:13,980 There is other boosted decision stumps, 5 00:00:13,980 --> 00:00:17,810 it's a really nice simple default way of training on your data. 6 00:00:17,810 --> 00:00:22,320 And so, it's one that we'll just go through a little further and 7 00:00:22,320 --> 00:00:24,890 it'll help us ground some of the concepts that we looked at so far. 8 00:00:26,260 --> 00:00:29,400 So here I've outlined the other boost algorithm that we discussed before in 9 00:00:29,400 --> 00:00:31,020 the previous slides, but 10 00:00:31,020 --> 00:00:35,850 just to be clear we're going to talk about linear decision stump for f of t, and 11 00:00:35,850 --> 00:00:40,070 figuring out how to update it, figure out its coefficient, w hat t. 12 00:00:41,280 --> 00:00:43,510 Our first step is figuring out how to learn the next decision stump, 13 00:00:43,510 --> 00:00:44,470 that's going to be f of t. 14 00:00:44,470 --> 00:00:47,570 And this is just going to be like standard decision stump learning. 15 00:00:47,570 --> 00:00:52,040 We're going to try splitting it on each feature, income, credit history, 16 00:00:52,040 --> 00:00:54,710 savings, market conditions. 17 00:00:54,710 --> 00:00:57,570 And figure out how well each of the resulting decisions stumps 18 00:00:57,570 --> 00:00:59,460 are a way to data. 19 00:00:59,460 --> 00:01:04,360 And notice that in our process we might split on income multiple times. 20 00:01:04,360 --> 00:01:07,430 So in multiple considerations we might revisit the same feature. 21 00:01:07,430 --> 00:01:09,280 So we're going to try each of those features, and for 22 00:01:09,280 --> 00:01:13,370 each one of them, measure the weighted error on string data. 23 00:01:13,370 --> 00:01:16,840 So for say splitting on income, the weight of the error might be 0.2. 24 00:01:16,840 --> 00:01:18,984 For splitting on credit, it might be 0.35. 25 00:01:18,984 --> 00:01:21,490 For splitting on savings, it might be 0.3. 26 00:01:21,490 --> 00:01:24,370 And finally, if you split the market conditions, 27 00:01:24,370 --> 00:01:26,160 it might be the worst of these four decision stumps. 28 00:01:26,160 --> 00:01:30,194 On this weighted data, it might have a weighted error of 0.4. 29 00:01:31,590 --> 00:01:36,670 So, we're picking the best feature, the one that has lowest weighted error, and 30 00:01:36,670 --> 00:01:38,680 so we're going to pick that first one split on. 31 00:01:38,680 --> 00:01:40,900 We're going to split on income, we'll get to the 100,000. 32 00:01:40,900 --> 00:01:43,520 And so f of t's, going to be that decision stamp, 33 00:01:43,520 --> 00:01:49,160 that says income's good to 100,000, if yes, safe, if not, is a risky loan. 34 00:01:50,260 --> 00:01:51,520 Now, the final question is, 35 00:01:51,520 --> 00:01:54,730 what coefficient we give to this particular classifier? 36 00:01:54,730 --> 00:01:59,520 So all we have to is plug 0.2 into the formula, and if we plug it in and 37 00:01:59,520 --> 00:02:02,070 do the math, 0.69 is the result. 38 00:02:02,070 --> 00:02:06,194 So the coefficient of this first decision stem is just going to be 0.69. 39 00:02:08,170 --> 00:02:12,690 Going back to the algorithm, we discussed how we are going to learn 40 00:02:12,690 --> 00:02:15,270 this new stuff from data and how we figure out its coefficient. 41 00:02:15,270 --> 00:02:20,320 Let's next talk about how to update the weight of i of each data point. 42 00:02:20,320 --> 00:02:24,140 So here's the intuitive process what happens. 43 00:02:24,140 --> 00:02:26,850 We have our data points and 44 00:02:26,850 --> 00:02:31,120 I'm highlighting them here depending on their income just like we did before. 45 00:02:31,120 --> 00:02:33,780 But I'm going to make a prediction using this decision stamp. 46 00:02:33,780 --> 00:02:39,040 The question is how good is this decision stamp income greater than 100,000, and if 47 00:02:39,040 --> 00:02:42,960 you look at it, it makes mistakes in some of the data points and get others right. 48 00:02:42,960 --> 00:02:47,220 So I marked the correct ones in bright green and mistakes in bright red. 49 00:02:48,410 --> 00:02:51,494 And if we take the previous weights, alpha for each one of these data points, 50 00:02:51,494 --> 00:02:54,730 I'm going to highlight where those weights were right there. 51 00:02:54,730 --> 00:02:57,650 We need to compute the new weight based on the formula above, 52 00:02:57,650 --> 00:03:00,260 which is standard formula. 53 00:03:00,260 --> 00:03:03,565 So, we're going to plug in the w hat that we computed, 0.69, 54 00:03:03,565 --> 00:03:08,410 into the formula to figure out what to multiply each one of those weights by. 55 00:03:08,410 --> 00:03:09,820 So, plug it in. 56 00:03:09,820 --> 00:03:14,850 You'll see that each of the -0.69 is a half, so 57 00:03:14,850 --> 00:03:20,160 for every correct data point one is half its weight, and each of the 0.69 is two. 58 00:03:20,160 --> 00:03:23,250 So for every incorrect data point, we're going to double its weight. 59 00:03:23,250 --> 00:03:24,810 So I'm going to go row by row, for 60 00:03:24,810 --> 00:03:28,030 the ones in green I got correct, I'm going to half the weights. 61 00:03:28,030 --> 00:03:31,600 So for the first row there the weight before was 0.5 and now becomes 0.25. 62 00:03:31,600 --> 00:03:35,560 The next one was 1.5 becomes 0.75 because of their correct. 63 00:03:35,560 --> 00:03:39,060 For third row I made a mistake, its weight before was 1.5. 64 00:03:39,060 --> 00:03:41,220 Now I'm going to double it, I'm going to make it 3. 65 00:03:41,220 --> 00:03:46,430 So it can go datapoint by datapoint and then multiplied by two, or divided by two, 66 00:03:46,430 --> 00:03:50,600 the weight, depending on whether we got that data point right or not. 67 00:03:50,600 --> 00:03:54,491 Extremely simple to be able to boost the decision stump classifier, and 68 00:03:54,491 --> 00:03:57,566 these tend do extremely well on a wide range of data sets. 69 00:03:57,566 --> 00:04:02,069 [MUSIC]