[MUSIC] In order to understand linear
classifiers a little better, let's review the notion
of decision boundary, which is a boundary between positive
predictions and negative predictions. Now, let's say that I have taken my data
and I trained my linear classifier and every word has zero weight except for
two of them. Awesome has weight 1.0 and
awful has weight -1.5. So what does that mean? That means that the score of any sentence
is 1.0 times the number of times the word awesome shows up minus 1.5 times the
number of times the word awful shows up. So let's plot that into
a graph which depends for every sentence the number of awesomes and
the number of awfuls. So for example, for a sentence, the sushi
was awesome, the food was awesome, but the service was awful. We're going to plot that into a space
where we're going to have two awesomes and one awful. So it gets plotted in the 2,1 point. And then every sentence that I might
have in my training data set or my prediction set might have, say,
three awfuls and one awesome, three awesomes and no awfuls, and so on. And I have a data set like this. The classifier that we've trained
with the coefficients 1.0 and -1.5 will have a decision boundary
that corresponds to a line, where 1.0 times awesome minus 1.5 times
the number of awfuls is equal to zero. Everything below that line
has score greater than zero. For any of those points. And any points above that line
are going to have score less than zero. For example, take the point
three awesomes and zero awfuls. That has a score greater than zero, so
we're going to classify that as plus 1. Similarly to all those
points below the line. Now, for the points above the line,
if you check yourself, you'll see all of those
have negative score, so we're going to label all of
those as negative predictions. And so there's that line,
everything below the line positive, everything above the line's negative, that's what makes it a linear classifier,
a linear decision boundary, really. Good. So we have seen that with two features or two coefficients, our decision boundary
is really just a line in this 2D plane. Now, in general,
we might have more coefficients than that. So if you have three features
that have no zero value, no zero coefficient,
then what we really have is a plane that tries to separate the positive
points from the negative ones. If you have more than three
non-zero coefficients, then we are in this high-dimensional
space in high dimensions. And we call that a hyperplane that
tries to separate the positives from the negatives. That was a sci-fi reference by the way. And in general, if you visualize these
hyperplanes in lower dimensional space, so if you use more complicated features or
shapes, you might have a decision boundary that looks kind of like that
squiggly, more complicated line. [MUSIC]