1 00:00:00,000 --> 00:00:04,162 [MUSIC] 2 00:00:04,162 --> 00:00:07,051 In order to understand linear classifiers a little better, 3 00:00:07,051 --> 00:00:09,375 let's review the notion of decision boundary, 4 00:00:09,375 --> 00:00:14,020 which is a boundary between positive predictions and negative predictions. 5 00:00:14,020 --> 00:00:19,550 Now, let's say that I have taken my data and I trained my linear classifier and 6 00:00:19,550 --> 00:00:23,410 every word has zero weight except for two of them. 7 00:00:23,410 --> 00:00:28,588 Awesome has weight 1.0 and awful has weight -1.5. 8 00:00:28,588 --> 00:00:30,074 So what does that mean? 9 00:00:30,074 --> 00:00:35,401 That means that the score of any sentence is 1.0 times the number of times the word 10 00:00:35,401 --> 00:00:40,520 awesome shows up minus 1.5 times the number of times the word awful shows up. 11 00:00:41,908 --> 00:00:45,310 So let's plot that into a graph which depends for 12 00:00:45,310 --> 00:00:48,110 every sentence the number of awesomes and the number of awfuls. 13 00:00:48,110 --> 00:00:52,725 So for example, for a sentence, the sushi was awesome, the food was awesome, 14 00:00:52,725 --> 00:00:54,550 but the service was awful. 15 00:00:55,700 --> 00:00:59,030 We're going to plot that into a space where we're going to have two awesomes and 16 00:00:59,030 --> 00:00:59,653 one awful. 17 00:00:59,653 --> 00:01:03,490 So it gets plotted in the 2,1 point. 18 00:01:03,490 --> 00:01:07,479 And then every sentence that I might have in my training data set or 19 00:01:07,479 --> 00:01:11,612 my prediction set might have, say, three awfuls and one awesome, 20 00:01:11,612 --> 00:01:14,170 three awesomes and no awfuls, and so on. 21 00:01:14,170 --> 00:01:15,530 And I have a data set like this. 22 00:01:17,230 --> 00:01:21,823 The classifier that we've trained with the coefficients 1.0 and 23 00:01:21,823 --> 00:01:26,339 -1.5 will have a decision boundary that corresponds to a line, 24 00:01:26,339 --> 00:01:31,978 where 1.0 times awesome minus 1.5 times the number of awfuls is equal to zero. 25 00:01:31,978 --> 00:01:35,786 Everything below that line has score greater than zero. 26 00:01:37,644 --> 00:01:39,030 For any of those points. 27 00:01:39,030 --> 00:01:43,350 And any points above that line are going to have score less than zero. 28 00:01:43,350 --> 00:01:48,800 For example, take the point three awesomes and zero awfuls. 29 00:01:48,800 --> 00:01:52,870 That has a score greater than zero, so we're going to classify that as plus 1. 30 00:01:52,870 --> 00:01:54,680 Similarly to all those points below the line. 31 00:01:56,290 --> 00:01:59,590 Now, for the points above the line, if you check yourself, 32 00:01:59,590 --> 00:02:02,000 you'll see all of those have negative score, so 33 00:02:02,000 --> 00:02:04,750 we're going to label all of those as negative predictions. 34 00:02:05,880 --> 00:02:08,749 And so there's that line, everything below the line positive, 35 00:02:08,749 --> 00:02:10,750 everything above the line's negative, 36 00:02:10,750 --> 00:02:14,230 that's what makes it a linear classifier, a linear decision boundary, really. 37 00:02:15,890 --> 00:02:16,390 Good. 38 00:02:16,390 --> 00:02:20,210 So we have seen that with two features or 39 00:02:20,210 --> 00:02:26,370 two coefficients, our decision boundary is really just a line in this 2D plane. 40 00:02:26,370 --> 00:02:28,735 Now, in general, we might have more coefficients than that. 41 00:02:28,735 --> 00:02:32,246 So if you have three features that have no zero value, 42 00:02:32,246 --> 00:02:35,913 no zero coefficient, then what we really have is a plane 43 00:02:35,913 --> 00:02:41,310 that tries to separate the positive points from the negative ones. 44 00:02:41,310 --> 00:02:44,760 If you have more than three non-zero coefficients, 45 00:02:44,760 --> 00:02:49,090 then we are in this high-dimensional space in high dimensions. 46 00:02:50,300 --> 00:02:54,420 And we call that a hyperplane that tries to separate the positives 47 00:02:54,420 --> 00:02:55,720 from the negatives. 48 00:02:55,720 --> 00:02:58,840 That was a sci-fi reference by the way. 49 00:02:58,840 --> 00:03:03,725 And in general, if you visualize these hyperplanes in lower dimensional space, 50 00:03:03,725 --> 00:03:08,254 so if you use more complicated features or shapes, you might have a decision 51 00:03:08,254 --> 00:03:12,672 boundary that looks kind of like that squiggly, more complicated line. 52 00:03:12,672 --> 00:03:17,179 [MUSIC]