In this video we'll talk about
how to get logistic regression to
work for multi-class classification problems,
and in particular I want to
tell you about an algorithm called one-versus-all classification.
What's a multi-class classification problem?
Here are some examples.
Let's say you want a learning
algorithm to automatically put your
email into different folders or
to automatically tag your emails.
So, you might have different folders
or different tags for work email,
email from your friends, email
from your family and emails about your hobby.
And so, here, we have
a classification problem with 4
classes, which we might
assign the numbers, the classes
y1, y2, y3 and
y4 to, Another
example for a medical
diagnosis: if a patient
comes into your office with
maybe a stuffy nose, the possible
diagnoses could be that
they're not ill, maybe that's
y1; or they have
a cold, 2; or they have the flu.
And the third and final example,
if you are using machine learning
to classify the weather, you know,
maybe you want to decide that
the weather is sunny, cloudy, rainy
or snow, or if there's gonna be snow.
And so, in all of these
examples, Y can take
on a small number of
discreet values, maybe 1 to
3, 1 to 4 and so on, and
these are multi-class classification problems.
And by the way, it doesn't really
matter whether we index as
0123 or as 1234, I tend
to index that my classes
starting from 1 rather than starting from 0.
But either way, where often, it really doesn't matter.
Whereas previously, for a
binary classification problem, our data sets look like this.
For a multi-class classification problem, our
data sets may look like
this, where here, I'm using
three different symbols to represent our three classes.
So, the question is: Given the
data set with three classes
where this is a the
example of one class, that's
the example of the different class,
and, that's the example of yet, the third class.
How do we get a learning algorithm to work for the setting?
We already know how to
do binary classification, using logistic
regression, we know how the,
you know, maybe, for the straight line,
to separate the positive and negative classes.
Using an idea called one
versus all classification, we can
then take this, and, make
it work for multi-class classification, as well.
Here's how one versus all classification works.
And, this is also sometimes called "one versus rest."
Let's say, we have a training
set, like that shown on the
left, where we have 3 classes.
So, if y1, we denote that
with a triangle if y2 the
square and, if y3 then, the cross.
What we're going to do is,
take a training set, and, turn
this into three separate binary classification problems.
So, I'll turn this into three separate
two class classification problems.
So let's start with Class 1, which is a triangle.
We are going to essentially create a
new, sort of fake training set.
where classes 2 and 3
get assigned to the negative
class and class 1
gets assigned to the positive class
when we create a new training
set if that's showing
on the right and we're going
to fit a classifier, which I'm
going to call h subscript theta
superscript 1 of x
where here, the triangles
are the positive examples and the circles are the negative examples.
So, think of the triangles be
assigned the value of 1
and the circles the sum, the value of zero.
And we're just going to train
a standard logistic regression crossfire
and maybe that will give us a position boundary.
OK?
The superscript 1 here is the class one.
So, we're doing this for the triangle first class.
Next, we do the same thing for class 2.
Going to take the squares and
assign the squares as the
positive class and assign
every thing else the triangles and the crosses as the negative class.
and then we fit a second logistic regression classifier.
I'm gonna call this H of X
superscript 2, where the
superscript 2 denotes that
we're now doing this:  treating the
square class as the positive
class and maybe we get the classifier like that.
And finally, we do the
same thing for the third
class and fit a third
classifier H superscript 3
of X and maybe this
will give us a decision boundary
or give us a classifier that separates
the positive and negative examples like that.
So, to summarize, what we've
done is we fit 3 classifiers.
So, for I equals 1
2 3 we'll fit a classifier
H superscript I subscript theta
of X, thus trying to
estimate what is the
probability that y is
equal to class I given X and prioritize by theta.
Right?
So, in the first
instance, for this first one
up here, this classifier
with learning to by the triangle.
So it's thinking of the triangles as a positive class.
So, X superscript one is
essentially trying to estimate what is
the probability that the Y
is equal to one, given
X and parametrized by theta.
And similarly, this is treating,
you know, the square class as
a positive took pause so its
trying to estimate the probability that y2 and so on.
So we now have 3 classifiers each
of which was trained is one of the three crosses.
Just to summarize, what we've
done is we've, we want
to train a logistic regression
classifier, H superscript I
of X, for each plus
i that predicts it's probably a
y i. Finally, to
make a prediction when we
give it a new input x and
we want to make a prediction,
we do is we just
run Let's say three
what run our 3 of our
classifiers on the input
x and we then
pick the class i that maximizes the three.
So, we just you know, basically
pick the classifier, pick whichever
one of the three classifiers is
most confident, or most enthusistically
says that it thinks it has a right class.
So, whichever value of i
gives us the highest probability, we
then predict y to be that value.
So, that's it for multi-class
classification and one-versus-all method.
And with this little method
you can now take the logistic
regression classifier and make
it work on multi-class classification problems as well.