1 00:00:00,000 --> 00:00:04,436 [MUSIC] 2 00:00:04,436 --> 00:00:08,150 Let's start by reviewing the intuition behind linear classifiers. 3 00:00:08,150 --> 00:00:10,220 The same intuition we covered in the first course. 4 00:00:11,520 --> 00:00:14,990 A linear classifier will take us in input some 5 00:00:14,990 --> 00:00:18,770 quantity x which in our case is sentences from reviews. 6 00:00:18,770 --> 00:00:21,030 It's going to feed it through it's classifier model and 7 00:00:21,030 --> 00:00:26,020 is going to make a prediction y that says, is this a positive review, in which case 8 00:00:26,020 --> 00:00:31,310 y hat is plus one, or is it a negative review in which case y hat is minus one. 9 00:00:31,310 --> 00:00:34,188 That's what we're trying to figure out. 10 00:00:34,188 --> 00:00:38,980 A linear classifier does a little bit more, associates every word for weight or 11 00:00:38,980 --> 00:00:43,700 coefficient which says how positively influential this word is or 12 00:00:43,700 --> 00:00:45,280 how negatively influential. 13 00:00:45,280 --> 00:00:50,363 So good might have a coefficient of 1.0, great might have a coefficient of 1.5. 14 00:00:50,363 --> 00:00:53,553 Awesome, is awesome and has a coefficient 2.7. 15 00:00:53,553 --> 00:00:58,103 Well in the negative side, might, bad might have a coefficient of minus 1, 16 00:00:58,103 --> 00:00:59,630 terrible minus 2.1. 17 00:00:59,630 --> 00:01:03,462 But awful, is just awful, so minus 3.3. 18 00:01:03,462 --> 00:01:07,048 And then some words are not that relevant to the sentiment of the review might have 19 00:01:07,048 --> 00:01:07,850 0 coefficient. 20 00:01:09,200 --> 00:01:13,071 Now let's see how these coefficient's can be used to make a prediction of whether 21 00:01:13,071 --> 00:01:15,370 a sentence is positive or negative. 22 00:01:15,370 --> 00:01:19,040 So for example let's take this sentence that says the sushi's great. 23 00:01:19,040 --> 00:01:21,030 So how do you score the sentence? 24 00:01:21,030 --> 00:01:26,950 Let's compute the score of this imput sentence xy, x1, xi. 25 00:01:26,950 --> 00:01:29,770 The sentence says, the sushi is great, so 26 00:01:29,770 --> 00:01:34,000 we look at the coefficient of great, and we see it's 1.2. 27 00:01:34,000 --> 00:01:39,814 And now it says, the food was awesome, so the coefficient of that is 1.7. 28 00:01:41,460 --> 00:01:44,210 And then it says, but, the service was terrible. 29 00:01:44,210 --> 00:01:45,520 My God, the service was terrible. 30 00:01:45,520 --> 00:01:46,500 So you subtract 2.1. 31 00:01:46,500 --> 00:01:51,170 And now we ask, what is the total score for this sentence? 32 00:01:51,170 --> 00:01:53,450 So some things are positive, some things are negative. 33 00:01:53,450 --> 00:01:58,980 The total score is 0.8, which is greater than 0. 34 00:01:58,980 --> 00:02:03,244 And that implies that we're going to predict that y hat, 35 00:02:03,244 --> 00:02:06,683 the sentiment for the sentence is plus one. 36 00:02:06,683 --> 00:02:08,633 So it's a positive review. 37 00:02:12,473 --> 00:02:17,129 And this is called a linear classifier because the output is 38 00:02:17,129 --> 00:02:20,230 the weighted sum of the inputs. 39 00:02:20,230 --> 00:02:22,560 So that's kind of what a linear classifier is. 40 00:02:22,560 --> 00:02:25,300 We'll see in a little bit more details what does that really means. 41 00:02:26,860 --> 00:02:29,670 So more generally a simple linear classifier 42 00:02:29,670 --> 00:02:33,980 which we're going to take as input coefficient associated with each word. 43 00:02:33,980 --> 00:02:36,450 And it's going to compute a score for that input. 44 00:02:36,450 --> 00:02:40,658 If the score is greater than zero, we say that the output, the prediction y hat, 45 00:02:40,658 --> 00:02:41,360 is +1. 46 00:02:41,360 --> 00:02:45,974 And if the score is less than zero, we say that the prediction is -1. 47 00:02:47,240 --> 00:02:50,050 Now, what we need to do is train the weights of these 48 00:02:50,050 --> 00:02:51,700 linear classifiers from data. 49 00:02:51,700 --> 00:02:56,060 So given some input training data that includes sentences of reviews 50 00:02:56,060 --> 00:02:59,530 labeled with either plus one or minus one, positive or negative. 51 00:02:59,530 --> 00:03:04,060 We're going to split those into some training set and some validation set. 52 00:03:04,060 --> 00:03:07,080 Then we're going to feed that training set to some learning algorithm which 53 00:03:07,080 --> 00:03:10,190 is going to learn the weights associated with each word, 54 00:03:10,190 --> 00:03:14,410 so 1.0 for good, 1.7 for awesome and so on. 55 00:03:14,410 --> 00:03:17,570 And then after we learn this classifier, we're going to go back and 56 00:03:17,570 --> 00:03:20,950 evaluate its accuracy on that validation set. 57 00:03:20,950 --> 00:03:25,160 So our goal for today is to explore that learning box. 58 00:03:25,160 --> 00:03:27,386 How do we learn this classifier from data and 59 00:03:27,386 --> 00:03:32,300 understand a little bit more deeply of what a linear classifier is really about? 60 00:03:32,300 --> 00:03:35,133 In particular, in the context of logistic regression. 61 00:03:35,133 --> 00:03:39,779 [MUSIC]