1 00:00:00,124 --> 00:00:04,383 [MUSIC] 2 00:00:04,383 --> 00:00:07,306 One of the most common types of classifiers is a so 3 00:00:07,306 --> 00:00:09,030 called linear classifier. 4 00:00:09,030 --> 00:00:10,310 So let's talk a little bit about that. 5 00:00:11,410 --> 00:00:15,390 The question here is how do we represent the classifier? 6 00:00:15,390 --> 00:00:20,190 So we start with some sentences, for example in a sentiment the mouse's case, 7 00:00:20,190 --> 00:00:20,850 the classifier. 8 00:00:20,850 --> 00:00:25,370 You get a prediction whether it is a positive sentence or a negative sentence. 9 00:00:25,370 --> 00:00:27,710 So, how does this classifier work? 10 00:00:27,710 --> 00:00:31,400 In the sentimental analysis you can imagine a simple 11 00:00:31,400 --> 00:00:32,680 kind of threshold classifier. 12 00:00:33,720 --> 00:00:37,600 Suppose that I take a sentence and somebody tells me 13 00:00:37,600 --> 00:00:42,860 these are all of the positive words, great, awesome, good amazing and so on. 14 00:00:42,860 --> 00:00:44,220 Here's a bunch of negative words. 15 00:00:44,220 --> 00:00:46,780 Bad, terrible, disgusting, food, and so on. 16 00:00:46,780 --> 00:00:49,390 And so what I can do is take the sentence and 17 00:00:49,390 --> 00:00:52,400 count how many positive words there are in a sentence and 18 00:00:52,400 --> 00:00:54,810 how many negative words are in the sentence, just count those. 19 00:00:56,000 --> 00:00:58,880 And then say if the number of positive words is higher 20 00:00:58,880 --> 00:01:00,770 than the number of negative words. 21 00:01:00,770 --> 00:01:03,390 Then we have a positive sentence but 22 00:01:03,390 --> 00:01:06,990 if they use more negative more words then you have a negative sentence. 23 00:01:06,990 --> 00:01:13,600 So for example, if the input sentence that we have is the sushi was great positive 24 00:01:13,600 --> 00:01:20,350 one, the food was awesome positive two, but the service was terrible negative one. 25 00:01:20,350 --> 00:01:23,230 You have two positives one negative and 26 00:01:23,230 --> 00:01:26,740 in the end was the positive wins and so you have a positive prediction. 27 00:01:28,680 --> 00:01:32,140 Now threshold classifiers have some limitations. 28 00:01:32,140 --> 00:01:34,940 Where does this list of positive and negative words actually come from? 29 00:01:34,940 --> 00:01:39,430 It has to magically come from somewhere and 30 00:01:39,430 --> 00:01:43,000 not just that, words have different degrees of positive and negativeness. 31 00:01:43,000 --> 00:01:45,182 So great is more positive than good. 32 00:01:45,182 --> 00:01:49,760 You wanna tune and figure out what's better great, good, amazing, 33 00:01:49,760 --> 00:01:51,360 is amazing better than great? 34 00:01:51,360 --> 00:01:52,310 Who knows? 35 00:01:52,310 --> 00:01:54,710 So how do we figure that out, how do we weigh different words? 36 00:01:56,430 --> 00:02:00,890 And single words might not be enough to make good classification. 37 00:02:00,890 --> 00:02:05,820 So good food, so the food was good is positive. 38 00:02:05,820 --> 00:02:08,390 The food was not good is actually negative. 39 00:02:09,530 --> 00:02:12,270 And so these are all issues that need to be addressed. 40 00:02:12,270 --> 00:02:15,010 The first two areas where the positive and negative words come from and 41 00:02:15,010 --> 00:02:17,840 how do you weigh them comes from learning a classifier and 42 00:02:17,840 --> 00:02:18,830 we're going to talk about next. 43 00:02:20,970 --> 00:02:24,190 The issue of good versus not good 44 00:02:24,190 --> 00:02:27,440 is addressed by using more complicated features than single words. 45 00:02:27,440 --> 00:02:29,730 And we're gonna talk about it towards the end of the module. 46 00:02:29,730 --> 00:02:34,310 So a linear classifier, instead of having a list of positive and 47 00:02:34,310 --> 00:02:37,720 negative words, actually takes all the words and adds weights to them. 48 00:02:38,780 --> 00:02:43,722 So, for example, good might have a weight of 1, great might have a weight of 1.5, 49 00:02:43,722 --> 00:02:46,514 awesome might have really big weight of 2.7. 50 00:02:46,514 --> 00:02:53,223 While bad might have a weight of -1, terrible may have a weight of -2.1, 51 00:02:53,223 --> 00:02:58,016 awful might be -3.3 cuz awful is really just awful. 52 00:02:58,016 --> 00:03:01,060 And this ways that really don't matter to sentiment. 53 00:03:01,060 --> 00:03:05,170 So things like the, we, where, restaurant, they appear both in positive and 54 00:03:05,170 --> 00:03:07,170 negative sentences so they get weight zero. 55 00:03:08,610 --> 00:03:11,900 Suppose somebody magically told you what the weight of each word were. 56 00:03:11,900 --> 00:03:15,250 We're going to talk about it in a little bit how those get learned 57 00:03:15,250 --> 00:03:16,500 by the classifier. 58 00:03:16,500 --> 00:03:21,270 But given those weight, how to figure out if the sentence is positive or negative. 59 00:03:21,270 --> 00:03:22,610 Here we use the idea of scoring. 60 00:03:23,670 --> 00:03:25,460 So for example take this sentence. 61 00:03:25,460 --> 00:03:29,190 The sushi was great, the food was awesome, but the service was terrible. 62 00:03:29,190 --> 00:03:30,280 Let's score that sentence. 63 00:03:30,280 --> 00:03:34,894 So we're gonna compute this, the score of the input sentence x. 64 00:03:34,894 --> 00:03:40,967 So in this case, you get from great, you get positive 1.2. 65 00:03:40,967 --> 00:03:45,894 From awesome, you get another 1.7 but 66 00:03:45,894 --> 00:03:51,383 from table, you get minus 2.1 right here, 67 00:03:51,383 --> 00:03:59,563 and so the grand total here is 2.9 minus 2.1, which is 0.8. 68 00:03:59,563 --> 00:04:04,943 And the key here is that since the score of the sentence is greater than zero, 69 00:04:04,943 --> 00:04:08,910 we're going to predict that it is a positive sentence. 70 00:04:10,080 --> 00:04:16,390 If the score was the opposite, if the score of x were less than zero, 71 00:04:16,390 --> 00:04:19,540 then we'd have predicted it's a negative sentence. 72 00:04:19,540 --> 00:04:22,110 So this is how a linear classifier works, 73 00:04:22,110 --> 00:04:26,760 if you know the weight of each word, and this is called 74 00:04:26,760 --> 00:04:31,860 a linear classifier because the output is basically the weighted sum of the input. 75 00:04:31,860 --> 00:04:35,530 Just weight, what features appears, what words appear in the input. 76 00:04:37,390 --> 00:04:41,590 So we're working for simple linear classifier to start out with. 77 00:04:41,590 --> 00:04:45,930 So in summary, given a sentence and given the weights for 78 00:04:45,930 --> 00:04:48,890 the sentence, what we do is compute the score, 79 00:04:48,890 --> 00:04:52,130 which is the weighted count of the words that appear in the sentence. 80 00:04:52,130 --> 00:04:54,820 And then we say if the score is greater than zero, 81 00:04:54,820 --> 00:04:56,470 we predict y-hat to be positive. 82 00:04:56,470 --> 00:05:00,800 While if the score is less than zero, we predict it to be negative. 83 00:05:00,800 --> 00:05:02,546 And that is a linear classifier. 84 00:05:02,546 --> 00:05:05,919 [MUSIC]