[MUSIC] So classifiers are really
trying to make decisions. Decisions as to whether a sentence
is positive or negative, or whether a set of lab tests plus x-rays plus measurements lead to a certain
disease like flu or cold. That's a decision that needs to be made. So let's talk a little bit about how
classifiers especially linear classifiers make decisions. To understand decision boundaries, suppose you only had two
words with non-zero weight. You have awesome with
positive weight of one and awful which is just awful so
it has a negative weight of 1.5. If you have this situation then the score
is gonna be 1 times the number of awesomes in the sentence,
minus 1.5 times the number of awfuls. So we can plot this on an axis,
there's the awesome axis and then there's the awful axis. So for example, the sentence,
the sushi was awesome, the food was awesome, but
the service was awful. Then that has two awesomes and one awful. It's plotted in the point
(2,1) on the axis. And similarly for something that has say,
three awfuls and one awesome, and for something that is all awesome,
so three awesomes is at a point (3,0), and so on for other sentences. Now, let's understand a little better
how we scored the sentences and what does that imply about our decisions. So, for example, take the point
(3,0) as 3 awesomes, no awfuls. Three awesomes gives you a positive prediction because the score
is greater than zero. And that is true for every point really
on the bottom right of the axis. While the points on the top left all
have score less than zero so for example the point three awfuls,
one awesome got score less than zero. So, those get labeled negative. And, in fact, what separates the negative
predictions from the positive predictions is the line that defines the places
where I don't know what's positive and what's negative, and that's the line
where 1.0 #awesome- 1.5 #awful = 0. And that's the line when, I don't know,
the prediction is uncertain, and so we call that the decision boundary. Everything on one side
we predict is positive, everything on the other
we predict is negative. Now notice that the decision boundary, 1.0 times awesome minus 1.5 times
awful equals to 0 is a line. And so that's why it's
called a linear classifier. It's a linear decision boundary. So decision boundaries
are what separate the positive predictions from the negative predictions. So, in the case of just two features,
we see this is just a line. But that situation will differ as
we increase the number of features. So in two dimensions,
a linear function is a line. In three dimensions, we have three, words
for example that have non-zero weight and everything else has zero weight and
we get to the plane. And so
it's a little hard to draw in 3D, but the positive predictions here are above
the plane and the negative predictions here are below the plane and the plane
is somehow inclined in the space. Now when you have not just
three non-zero words but in re-application you're gonna have tens of
thousands of words with non-zero weight. And in that case we'll
call those hyperplanes, really high dimensional
separators called hyperplanes. Now, of course you can use
more than linear classifiers. You can use more complex classifiers. And those, instead of having lines or hyperplanes, they have more complicated
shapes or squiggly separations. And we're gonna learn more about
that in the classification course. >> [MUSIC]