[MUSIC] Finally, let's talk about multiclass classification. And in particular, we're going to talk about perhaps the simplest, but very useful approach for doing multiclass classifications. It's called 1 verses all approach. So let's say, here's an example of multiclass classification. I give you an image of some object, maybe it's an image of my dog and I feed this into the classifier to try to predict what object is in that image. So the output y is the object in the image. Maybe a Labrador retriever, a golden retriever, table monitor, camera and so on. So, that's the prediction task that we're going to do. And it has more than two classes and not just +1 minus 1, but maybe have a thousand different categories. So, how do we solve a problem like this? There are many approaches to doing that. Let's talk about a very simple, but super useful approach called one versus all and then we're going to use the following example where we have three classes, triangles, hearts and donuts. But of course, you might have many more classes and I'm using capital C to denote the total number of classes. Capital C in this case is three, but it could be 10,000 in a different case and we still have n data points. And now for each data point, we have associated with it not just x1, x2 and so on, but also the category y, which is still being just plus 1 minus 1. In this case, it's triangles, hearts or donuts and what I'd like to know is for particular input, x1. What is the probability that this import corresponds to a triangle, to a heart or to a donut? Or in the case of damage, whether it corresponds to a golden retriever, a Labrador retriever or Tamara. The 1 versus all model is extremely simple and is exactly what you expect of the words 1 versus all. Here's what we do, you train a classifier for each category. So for example, you train one that says the +1 category is going to be all the triangles and the negative category is going to be everything else. In our example, hearts and doughnuts and what you're trying to do is a learn a classifier that separates most of the triangles from the hearts and the donuts. So in particular, we're going to train or classify them denote by p hat of triangle, which outputs +1 if the input x is more likely to be a triangle, then everything else. Don't know it or heart. And then the way that we estimate the probability that an input x i is a triangle is just by saying it's P hat of triangle with y = + 1. So in our picture on the right, here's what our classifier's going to do. It's going to assign score of xi to be greater than 0 on the triangle side. Score of xi to be less than 0 on the donuts and heart side, hopefully. Which is going to mean that on the triangle side the probability that y is equal to triangle given the input xi and the parameter w is going to be greater than 0.5 and on the other side, the probability that y is equal to triangle given the input xi and the parameters w is going to be hopefully, less than 0.5. So this let's me take a data point of say, is this more likely to be a triangle, than a donut or a heart? That doesn't tell me how to do multiclass classification in general. How do we do multiclass classification in general? And so what we'll do is we'll learn a model for each one of these cases. So, it's going to be 1 versus all for each one of the classes. So we will learn a 1 versus all that tries to compare triangles against donuts and hearts, which going to be find by P heart sub-triangle that y equals plus 1, given xi and w. We're going to learn a model for hearts that tries to separate hearts from triangles and donuts, which is going to be P heart and we learn different datasets, so it's going to be subhearts. That's a gnarly heart. I want pretty hearts here. That's slightly prettier, only slightly. [LAUGH] The probability that it's plus one given the input xi and the parameters w. And lastly, we have for a donuts, the probability according to the donut model of y = +1 given xi and w. And one last little note is that the w's for each one's of the smallest are here separating donuts from everything else. The w's for each one's the smallest difference, because you can see by the lines being different. So in the first one is w of triangles and the second one is w of hearts, and then the last one is W of donuts. So, we trained this one versus all models and what do we output? As a prediction, we just say, whatever class has the highest probability wins. So in other words, if the probability that an input is a heart against everything else is higher than the point is a triangle, higher than the point is a donut, you said that the class is hard. More explicitly in multiclass classification, we're going to train our model to ask for each class, what's the probability that it wins against everything else? The probability of y equals 1, given x and you estimate one of those for each class. And the one way, you get a particular input xi, for example, this image of my dog. What we're going to do is compute the problem to the every class estimates an output maximum and then just written out here, the kind of natural algorithm. So, we start with the maximum probability being zero and y hat being the non category zero. And we go class by class and ask, is the probability, the y = 1, according to the model for this class, the model for Labrador retriever, the model for golden retriever, the model for camera. Is that higher than the max probability of C so far? If it's higher that means that class looks like it's winning, so we say that y heart is whatever this class says and we update the maximum probability to be the probability according to this class. And as iterate over each one of these, the maximum is going to win. So, this is just kind of an algorithm. It's for exactly what you would expect. We check how each model believes, whether he believes it's a dog, a Labrador retriever, a golden retriever, a camera. What the probabilities are and he just helped put the object that has the highest probability. And with that simple, simple algorithm, we now have a multi-class classification system by using a number of these binary classifiers. [MUSIC]