In this and the next few videos, I want to start to talk about classification problems, where the variable y that you want to predict is discreet valued. We'll develop an algorithm called logistic regression, which is one of the most popular and most widely used learning algorithms today. Here are some examples of classification problems. Earlier, we talked about emails, spam classification as an example of a classification problem. Another example would be classifying online transactions. So, if you have a website that sells stuff and if you want to know if a physical transaction is fraudulent or not, whether someone has, you know, is using a stolen credit card or has stolen the user's password. That's another classification problem, and earlier we also talked about the example of classifying tumors as a cancerous malignant or as benign tumors. In all of these problems, the variable that we're trying to predict is a variable Y that we can think of as taking on two values, either zero or one, either a spam or not spam, fraudulent or not fraudulent, malignant or benign. Another name for the class that we denote with 0 is the negative class, and another name for the class that we denote with 1 is the positive class. So 0 may denote the benign tumor and 1 positive class may denote a malignant tumor. The assignment of the 2 classes, you know, spam, no spam, and so on - the assignment of the 2 classes to positive and negative, to 0 and 1 is somewhat arbitrary and it doesn't really matter. But often there is this intuition that the negative class is conveying the absence of something, like the absence of a malignant tumor, whereas one, the positive class, is conveying the presence of something that we may be looking for. But the definition of which is negative and which is positive is somewhat arbitrary and it doesn't matter that much. For now, we're going to start with classification problems with just two classes; zero and one. Later on, we'll talk about multi-class problems as well, whether variable Y may take on say, for value zero, one, two and three. This is called a multi-class classification problem, but for the next few videos, let's start with the two class or the binary classification problem. and we'll worry about the multi-class setting later. So, how do we develop a classification algorithm? Here's an example of a training set for a classification task for classifying a tumor as malignant or benign and notice that malignancy takes on only two values zero or no or one or one or yes. So, one thing we could do given this training set is to apply the algorithm that we already know, linear regression to this data set and just try to fit the straight line to the data. So, if you take this training set and fill a straight line to it, maybe you get hypothesis that looks like that. Alright, so that's my hypothesis, h of x equals theta transpose x. If you want to make predictions, one thing you could try doing is then threshold the classifier outputs at 0.5. That is at the vertical access value 0.5. And if the hypothesis outputs a value that's greater than equal to 0.5 you predict y equals one. If it's less than 0.5, you predict y equals zero. Let's see what happens when we do that. So, let's take 0.5, and so, you know, that's where the threshold is. And thus, using linear regression this way. Everything to the right of this point, we will end up predicting as the positive class because of the output values are greater than 0.5 on the vertical axis and everything to the left of that point we will end up predicting as a negative value. In this particular example, it looks like linear regression is actually doing something reasonable even though this is a classification task we're interested in. But now let's try changing problem a bit. Let me extend out the horizontal axis of orbit and let's say we got one more training example way out there on the right. Notice that that additional training example, this one out here, it doesn't actually change anything, right? Looking at the training set, it is pretty clear what a good hypothesis is. Well, everything to the right of somewhere around here to the right of this we should predict as positive, and everything to the left we should probably predict as negative because from this training set it looks like all the tumors larger than, you know, a certain value around here are malignant, and all the tumors smaller than that are not malignant, at least for this training set. But once we've added that extra example out here, if you now run linear regression, you instead get a straight line fit to the data. That might maybe look like this, and if you now threshold this hypothesis at 0.5, you end up with a threshold that's around here so that everything to the right of this point you predict as positive, and everything to the left of that point you predict as negative. And this seems a pretty bad thing for linear regression to have done, right? Because, you know, these are our positive examples, these are our negative examples. It's pretty clear, we should really be separating the two classes somewhere around there, but somehow by adding one example way out here to the right, this example really isn't giving us any new information. I mean, it should be no surprise to the learning out of that the example way out here turns out to be malignant. But somehow adding that example out there caused linear regression to change in straight line fit to the data from this magenta line out here to this blue line over here, and caused it to give us a worse hypothesis. So, applying linear regression to a classification problem usually isn't, often isn't a great idea. In the first instance, in the first example before I added this extra training example, previously linear regression was just getting lucky and it got us a hypothesis that, you know, worked well for that particular example, but usually apply linear regression to a data set, you know, you might get lucky but often it isn't a good idea, so I wouldn't use linear regression for classification problems. Here is one other funny thing about what would happen if we were to use linear regression for a classification problem. For classification, we know that Y is either zero or one, but if you are using linear regression, well the hypothesis can output values much larger than one or less than zero, even if all of good the training examples have labels Y equals zero or one, and it seems kind of strange that even though we know that the label should be zero one, it seems kind of strange if the algorithm can offer values much larger than one or much smaller than zero. So what we'll do in the next few videos is develop an algorithm called logistic regression which has the property that the output, the predictions of logistic regression are always between zero and one, and doesn't become bigger than one or become less than zero and by the way, logistic regression is and we will use it as a classification algorithm in some, maybe sometimes confusing that the term regression appears in his name, even though logistic regression is actually a classification algorithm. But that's just the name it was given for historical reasons so don't be confused by that. Logistic Regression is actually a classification algorithm that we apply to settings where the label Y is discreet valued. The 1001. So hopefully you now know why if you have a causation problem using linear regression isn't a good idea . In the next video we'll start working out the details of the logistic regression algorithm.