1 00:00:00,107 --> 00:00:04,477 [MUSIC] 2 00:00:04,477 --> 00:00:07,650 >> Let's start talking about how do you compute w hat t. 3 00:00:08,848 --> 00:00:13,750 And this quantity's intuitive and has to know how good, or how much we trust ft. 4 00:00:13,750 --> 00:00:15,590 The classify we learn that's [INAUDIBLE] iteration. 5 00:00:17,070 --> 00:00:21,830 So, specifically, if ft is good, we like it, 6 00:00:21,830 --> 00:00:24,890 it's doing well in our data, we want w hat t to be large. 7 00:00:24,890 --> 00:00:29,535 In fact, if ft has really, really great accuracy, very low error, 8 00:00:29,535 --> 00:00:32,220 we want wt to be really big. 9 00:00:32,220 --> 00:00:34,920 However if ft is really bad, 10 00:00:34,920 --> 00:00:38,970 if it really is terrible at making predictions, we should down weight it. 11 00:00:38,970 --> 00:00:41,640 We should not trust that particular vote. 12 00:00:41,640 --> 00:00:46,130 So in other words, how do we measure whether a classifier's good or not? 13 00:00:46,130 --> 00:00:50,510 As we said, we said ft is good if he has low training error. 14 00:00:50,510 --> 00:00:53,040 However you have to remember that we have weighted data. 15 00:00:53,040 --> 00:00:55,650 So what we really care about is how well it's doing weighted data. 16 00:00:55,650 --> 00:00:59,485 For example, if we're weighing more certain datapoints because they're really 17 00:00:59,485 --> 00:01:02,659 high, they're making lots of mistakes on those, we want to make sure 18 00:01:02,659 --> 00:01:05,690 that the classifier has low error on those really hard mistakes. 19 00:01:06,800 --> 00:01:10,600 And so let's look at measuring error on weighted data. 20 00:01:11,890 --> 00:01:12,698 Measuring error and 21 00:01:12,698 --> 00:01:15,820 weighted data is very similar to measuring error in regular data. 22 00:01:15,820 --> 00:01:17,610 You have a data point. 23 00:01:17,610 --> 00:01:19,380 For example, the sushi was great and 24 00:01:19,380 --> 00:01:23,040 is labeled as positive, but now we have a weight. 25 00:01:23,040 --> 00:01:25,890 In this case, alpha, which might be 1.2. 26 00:01:25,890 --> 00:01:29,690 So this is a data point which is say above average importance. 27 00:01:29,690 --> 00:01:34,090 So we want to measure the weighted total of the correct examples and 28 00:01:34,090 --> 00:01:35,800 the weighted total of the mistakes. 29 00:01:35,800 --> 00:01:41,150 So we take our learned classifier, f of t, and we feed that review, in this case, 30 00:01:41,150 --> 00:01:46,840 the sushi was great, but we hide the label, which in this case was positive. 31 00:01:46,840 --> 00:01:48,250 And now we compare the prediction. 32 00:01:48,250 --> 00:01:52,020 For example, let's say that y hat was plus one for this input. 33 00:01:52,020 --> 00:01:54,710 It's the same as the two labels correct, so 34 00:01:54,710 --> 00:01:59,450 we add the weight 1.2 to the weight of other correct examples we've seen. 35 00:01:59,450 --> 00:02:01,170 So that's awesome. 36 00:02:02,170 --> 00:02:03,421 But let's say we have another data point. 37 00:02:03,421 --> 00:02:07,790 The food was OK, which is labeled truly as negative, and 38 00:02:07,790 --> 00:02:09,330 we talked about this example before. 39 00:02:09,330 --> 00:02:11,880 We feed food was OK to a classifier. 40 00:02:11,880 --> 00:02:13,410 We hide the label. 41 00:02:13,410 --> 00:02:15,650 Minus 1. But our classifier gets confused. 42 00:02:15,650 --> 00:02:18,920 It doesn't know the cultural reference, the food was OK, and 43 00:02:18,920 --> 00:02:23,050 thinks it is a positive example, y hat is plus 1, and it's a mistake. 44 00:02:23,050 --> 00:02:25,572 So we take the weight of this data point 0.5 and 45 00:02:25,572 --> 00:02:28,360 add it to the total weight of the mistakes. 46 00:02:28,360 --> 00:02:31,350 So keep adding the weight of the mistakes versus the weight of the correct 47 00:02:31,350 --> 00:02:32,430 classifications. 48 00:02:32,430 --> 00:02:33,800 And use that to measure the error. 49 00:02:35,010 --> 00:02:35,670 Now that we have seen and 50 00:02:35,670 --> 00:02:40,040 intuitive notion of what a weighted error is, let's write the down the equations for 51 00:02:40,040 --> 00:02:42,470 the weighted error, so we can be sure if we need to implement it. 52 00:02:42,470 --> 00:02:45,810 So, the first thing we need to measure is the total weight of all the mistakes, so 53 00:02:45,810 --> 00:02:49,970 the sum of our mistakes of the weight of those data points. 54 00:02:49,970 --> 00:02:54,655 So this is the sum over the datapoint, so i equals 55 00:02:54,655 --> 00:02:59,640 1 through N, of an indicator that says, was this a mistake? 56 00:02:59,640 --> 00:03:04,300 So is y hat i different than yi? 57 00:03:04,300 --> 00:03:10,370 So this just measure whether it was a mistake, and if it was a mistake we 58 00:03:10,370 --> 00:03:14,170 don't just count it as a mistake, we count it whatever weight that datapoint has. 59 00:03:14,170 --> 00:03:17,890 So we're going to weigh that contribution by alpha i. 60 00:03:17,890 --> 00:03:20,240 And now, to compute the error, we're going to normalize it so 61 00:03:20,240 --> 00:03:21,090 it's a number between zero and 62 00:03:21,090 --> 00:03:25,130 one, so we have to divide it by the total weight of all the data points. 63 00:03:25,130 --> 00:03:33,880 So it's the sum over i equals 1 through N of the weight of all the data points of i. 64 00:03:33,880 --> 00:03:39,575 And these are the two quantities that we care about, 65 00:03:39,575 --> 00:03:44,881 and the weighted error can be denoted by the total 66 00:03:44,881 --> 00:03:52,530 weight of the mistakes divided by the total weight of all data points. 67 00:03:53,560 --> 00:03:58,539 Extremely simple, the best possible error you could hope for is 0.0. 68 00:03:58,539 --> 00:04:01,843 Now, the worst error is 1.0, 69 00:04:01,843 --> 00:04:07,940 which means that we're making mistakes everywhere. 70 00:04:07,940 --> 00:04:11,580 But notice that if we're making mistakes everywhere, 71 00:04:11,580 --> 00:04:14,230 if we emerge a class fire we're going to get everything right. 72 00:04:14,230 --> 00:04:18,770 So the way to think about it is in the worst possible case in some ways is 73 00:04:20,710 --> 00:04:22,170 how random does. 74 00:04:22,170 --> 00:04:27,105 So a random classifier will get error of 0.5, and 75 00:04:27,105 --> 00:04:33,499 we discussed this in the first course of how a random classifier gets 76 00:04:33,499 --> 00:04:39,880 error 0.5 on a binary classification problem like this. 77 00:04:39,880 --> 00:04:44,982 So now that we've seen the weighted error, let's look at how we 78 00:04:44,982 --> 00:04:50,759 can update the coefficient w hat t of the function that we learn. 79 00:04:50,759 --> 00:04:54,869 >> [MUSIC]