[MUSIC] In order to understand linear classifiers a little better, let's review the notion of decision boundary, which is a boundary between positive predictions and negative predictions. Now, let's say that I have taken my data and I trained my linear classifier and every word has zero weight except for two of them. Awesome has weight 1.0 and awful has weight -1.5. So what does that mean? That means that the score of any sentence is 1.0 times the number of times the word awesome shows up minus 1.5 times the number of times the word awful shows up. So let's plot that into a graph which depends for every sentence the number of awesomes and the number of awfuls. So for example, for a sentence, the sushi was awesome, the food was awesome, but the service was awful. We're going to plot that into a space where we're going to have two awesomes and one awful. So it gets plotted in the 2,1 point. And then every sentence that I might have in my training data set or my prediction set might have, say, three awfuls and one awesome, three awesomes and no awfuls, and so on. And I have a data set like this. The classifier that we've trained with the coefficients 1.0 and -1.5 will have a decision boundary that corresponds to a line, where 1.0 times awesome minus 1.5 times the number of awfuls is equal to zero. Everything below that line has score greater than zero. For any of those points. And any points above that line are going to have score less than zero. For example, take the point three awesomes and zero awfuls. That has a score greater than zero, so we're going to classify that as plus 1. Similarly to all those points below the line. Now, for the points above the line, if you check yourself, you'll see all of those have negative score, so we're going to label all of those as negative predictions. And so there's that line, everything below the line positive, everything above the line's negative, that's what makes it a linear classifier, a linear decision boundary, really. Good. So we have seen that with two features or two coefficients, our decision boundary is really just a line in this 2D plane. Now, in general, we might have more coefficients than that. So if you have three features that have no zero value, no zero coefficient, then what we really have is a plane that tries to separate the positive points from the negative ones. If you have more than three non-zero coefficients, then we are in this high-dimensional space in high dimensions. And we call that a hyperplane that tries to separate the positives from the negatives. That was a sci-fi reference by the way. And in general, if you visualize these hyperplanes in lower dimensional space, so if you use more complicated features or shapes, you might have a decision boundary that looks kind of like that squiggly, more complicated line. [MUSIC]