1 00:00:04,830 --> 00:00:08,180 We've now seen linear classifiers, in particular, logistic and 2 00:00:08,180 --> 00:00:10,870 reaction is a really core example of those, and 3 00:00:10,870 --> 00:00:14,050 how to learn them from data using gradient descent algorithms. 4 00:00:14,050 --> 00:00:16,600 So we're now ready to build those classifiers. 5 00:00:16,600 --> 00:00:19,870 However, when we go into practical settings we have to think about over 6 00:00:19,870 --> 00:00:25,080 fitting, which is very significant problem in machine learning and in particular for 7 00:00:25,080 --> 00:00:27,890 logistic aggression can be really troublesome. 8 00:00:27,890 --> 00:00:30,729 So let's see how we can avoid over fitting in this setting. 9 00:00:32,220 --> 00:00:36,000 In order to explore the concept of over fitting we need to better understand how 10 00:00:36,000 --> 00:00:38,890 we measure error in classification in general. 11 00:00:38,890 --> 00:00:42,540 So for a classifier, we typically start with some data that has been labeled 12 00:00:42,540 --> 00:00:47,340 as positive or negative reviews, and then we split that data into a training set 13 00:00:47,340 --> 00:00:52,240 which we use to train our model and a validation set, which would then 14 00:00:52,240 --> 00:00:56,850 take the results of the learned model and use it to evaluate our classifier. 15 00:00:56,850 --> 00:01:01,090 So let's talk a little bit about the evaluation of a classifier, in general. 16 00:01:01,090 --> 00:01:03,350 As we discussed in the first course. 17 00:01:03,350 --> 00:01:07,700 We measure a classifier's performance using what's called classification error. 18 00:01:07,700 --> 00:01:10,820 For example, if I give a sentence, the sushi was great, and 19 00:01:10,820 --> 00:01:14,120 it's labeled positive, and I want to measure my error in that sentence, 20 00:01:14,120 --> 00:01:18,530 what I do is I feed the sentence into my classifier. 21 00:01:18,530 --> 00:01:23,160 The sushi was great, but I hide that label, so that the classifier doesn't 22 00:01:23,160 --> 00:01:25,610 get to know whether the sentence was labeled as positive or 23 00:01:25,610 --> 00:01:27,590 negative in the training data. 24 00:01:27,590 --> 00:01:32,970 And then I compare the output, y hat of my classifier with the true label. 25 00:01:32,970 --> 00:01:36,370 In this case they both agree, so the classifier is correct, so 26 00:01:36,370 --> 00:01:38,120 I add one to the correct column. 27 00:01:40,040 --> 00:01:43,330 However, if I take another sentence, so for example, I take the sentence, 28 00:01:43,330 --> 00:01:48,280 the food was okay, which in the training set was labeled as a negative example. 29 00:01:48,280 --> 00:01:50,980 But this is a little bit of a euphemism from America. 30 00:01:50,980 --> 00:01:52,310 Is it positive, is it negative? 31 00:01:52,310 --> 00:01:55,356 People say okay when they mean bad stuff. 32 00:01:55,356 --> 00:02:02,010 The classifier might not be familiar with that kind of cultural jargon. 33 00:02:02,010 --> 00:02:05,540 So when you hide the label, the classifier might say y why hat is positive but 34 00:02:05,540 --> 00:02:07,050 really it's a negative label. 35 00:02:07,050 --> 00:02:09,340 And the classifiers then made a mistake. 36 00:02:09,340 --> 00:02:12,180 So now we put plus one in the mistake column. 37 00:02:12,180 --> 00:02:16,010 So in general we're going to go example by example in a validation set and 38 00:02:16,010 --> 00:02:18,960 measure which one's we got right and which one's we made a mistake. 39 00:02:20,430 --> 00:02:26,530 And now we can measure what's called the error or classification error in our data. 40 00:02:26,530 --> 00:02:30,560 So let me turn my pen on here, and I'm going to use white here. 41 00:02:30,560 --> 00:02:32,180 So it's really simple. 42 00:02:32,180 --> 00:02:35,940 There are measures the fraction of data points where we made mistakes. 43 00:02:35,940 --> 00:02:43,063 So it's the ratio between the number 44 00:02:43,063 --> 00:02:48,117 of mistakes that I made and 45 00:02:48,117 --> 00:02:54,570 the total number of data points. 46 00:02:54,570 --> 00:02:57,160 Sometimes we also talk about accuracy as well, 47 00:02:57,160 --> 00:03:00,480 which is one minus the number of the error. 48 00:03:00,480 --> 00:03:03,700 So here, we talk about, instead of the number of mistakes. 49 00:03:03,700 --> 00:03:08,250 It's the number of data points where we got them 50 00:03:08,250 --> 00:03:13,274 correct divided by the total number of data points. 51 00:03:15,875 --> 00:03:17,302 Very good. 52 00:03:17,302 --> 00:03:21,479 [MUSIC]