In the last video, we talked about the hypothesis representation for logistic progression. What I'd like to do now is tell you about something called the decision boundary, and this will give us a better sense of what the logistic regression hypothesis function is computing. To recap, this is what we wrote out last time, where we said that the hypothesis is represented as H of X equals G of theta transpose X, where G is this function called the sigmoid function, which looks like this. So, it slowly increases from zero to one, asymptoting at one. What I want to do now is try to understand better when this hypothesis will make predictions that Y is equal to one versus when it might make predictions that Y is equal to zero and understand better what the hypothesis function looks like, particularly when we have more than one feature. Concretely, this hypothesis is out putting estimates of the probability that Y is equal to one given X is prime. So if we wanted to predict is Y equal to one or is Y equal to zero here's something we might do. When ever the hypothesis its that the problem with y being one is greater than or equal to 0.5 so this means that it is more likely to be y equals one than y equals zero then let's predict Y equals one. And otherwise, if the probability of, the estimated probability of Y being one is less than 0.5, then let's predict Y equals zero. And I chose a greater than or equal to here and less than here. If H of X is equal to 0.5 exactly, then we could predict positive or negative vector but a put a great deal on to here so we default maybe to predicting a positive if your vector is 0.5 but that's a detail that really doesn't matter that much. What I want to do is understand better when it is exactly that H of X will be greater or equal to 0.5, so that we end up predicting Y is equal to one. If we look at this plot of the sigmoid function, we'll notice that the sigmoid function, G of Z, is greater than or equal to 0.5 whenever Z is greater than or equal to zero. So is in this half of the figure that, G takes on values that are 0.5 and higher. This is not clear, that's the 0.5. So when Z is positive, G of Z, the sigmoid function, is greater than or equal to 0.5. Since the hypothesis for logistic regression is H of X equals G of theta transpose X. This is therefore going to be greater than or equal to 0.5 whenever theta transpose X is greater than or equal to zero. So what was shown, right, because here theta transpose X takes the row of Z. So what we're shown is that our hypothesis is going to predict Y equals one whenever theta transpose X is greater than or equal to 0. Let's now consider the other case of when a hypothesis will predict Y is equal to 0. Well, by similar argument, H of X is going to be less than 0.5 whenever G of Z is less than 0.5 because the range of values of Z that calls Z to take on values less that 0.5, well that's when Z is negative. So when G of Z is less than 0.5. Our hypothesis will predict that Y is equal to zero, and by similar argument to what we had earlier, H of X is equal G of theta transpose X. And so, we'll predict Y equals zero whenever this quantity theta transpose X is less than zero. To summarize what we just worked out, we saw that if we decide to predict whether Y is equal to one or Y is equal to zero, depending on whether the estimated probability is greater than or equal 0.5, or whether it's less than 0.5, then that's the same as saying that will predict Y equals 1 whenever theta transpose axis greater than or equal to 0, and we will predict Y is equal to zero whenever theta transpose X is less than zero. Let's use this to better understand how the hypothesis of logistic regression makes those predictions. Now, let's suppose we have a training set like that shown on the slide, and suppose our hypothesis is H of X equals G of theta zero, plus theta one X1 plus theta two X2. We haven't talked yet about how to fit the parameters of this model. We'll talk about that in the next video. But suppose that variable procedure to be specified, we end up choosing the following values for the parameters. Let's say we choose theta zero equals three, theta one equals one, theta two equals one. So this means that my parameter vector is going to be theta equals minus 311. So, we're given this choice of my hypothesis parameters, let's try to figure out where a hypothesis will end up predicting y equals 1 and where it will end up predicting y equals 0. Using the formulas that we worked on the previous slide, we know that Y equals 1 is more likely, that is the probability that Y equals 1 is greater than 0.5 or greater than or equal to 0.5. Whenever theta transpose x is greater than zero. And this formula that I just underlined minus three plus X1 plus X2 is, of course, theta transpose X when theta is equal to this value of the parameters that we just chose. So, for any example, for any example with features X1 and X2 that satisfy this equation that minus 3 plus X1 plus X2 is greater than or equal to 0. Our hypothesis will think that Y equals 1 is more likely, or will predict that Y is equal to one. We can also take minus three and bring this to the right and rewrite this as X1 plus X2 is greater than or equal to three. And so, equivalently, we found that this hypothesis will predict Y equals one whenever X1 plus X2 is greater than or equal to three. Let's see what that means on the figure. If I write down the equation, X1 plus X2 equals three, this defines the equation of a straight line. And if I draw what that straight line looks like, it gives me the following line which passes through 3 and 3 on the X1 and the X2 axis. So the part of the input space, the part of the X1, X2 plane that corresponds to when X1 plus X2 is greater than or equal to three. That's going to be this very top plane. That is everything to the up, and everything to the upper right portion of this magenta line that I just drew. And so, the region where our hypothesis will predict Y equals 1 is this region, you know, is really this huge region, this half-space over to the upper right. And let me just write that down. I'm gonna call this the Y equals one region, and in contrast the region where X1 plus X2 is less than three, that's when we will predict that Y, Y is equal to zero, and that corresponds to this region. You know, itt's really a half-plane, but that region on the left is the region where our hypothesis predict Y equals 0. I want to give this line, this magenta line that I drew a name. This line there is called the decision boundary. And concretely, this straight line X1 plus X equals 3. That corresponds to the set of points. So that corresponds to the region where H of X is equal to 0.5 exactly and the decision boundary, that is this straight line, that's the line that separates the region where the hypothesis predicts Y equals one from the region where the hypothesis predicts that Y is equal to 0. And just to be clear. The decision boundary is a property of the hypothesis including the parameters theta 0, theta 1, theta 2. And in the figure I drew a training set. I drew a data set in order to help the visualization. But even if we take away the data set, you know decision boundary and a region where we predict Y equals 1 versus Y equals zero. That's a property of the hypothesis and of the parameters of the hypothesis, and not a property of the data set. Later on, of course, we'll talk about how to fit the parameters and there we'll end up using the training set, or using our data, to determine the value of the parameters. But once we have particular values for the parameters: theta 0, theta 1, theta 2. Then that completely defines the decision boundary and we don't actually need to plot a training set in order to plot the decision boundary. Let's now look at a more complex example where, as usual, I have crosses to denote my positive examples and O's to denote my negative examples. Given a training set like this, how can I get logistic regression to fit this sort of data? Earlier, when we were talking about polynomial regression or when we're linear regression, we talked about how we can add extra higher order polynomial terms to the features. And we can do the same for logistic regression. Concretely, let's say my hypothesis looks like this. Where I've added two extra features, X1 squared and X2 squared, to my features. So that I now have 5 parameters, theta 0 through theta 4. As before, we'll defer to the next video our discussion on how to automatically choose values for the parameters theta 0 through theta 4. But let's say that very procedure to be specified, I end up choosing theta 0 equals minus 1, theta 1 equals 0, theta 2 equals 0, theta 3 equals 1, and theta 4 equals 1. What this means is that with this particular choice of parameters, my parameter vector theta looks like minus 1, 0, 0, 1, 1. Following our earlier discussion, this means that my hypothesis will predict that Y is equal to 1 whenever minus 1 plus X1 squared plus X2 squared is greater than or equal to 0. This is whenever theta transpose times my theta transpose my features is greater than or equal to 0. And if I take minus 1 and just bring this to the right, I'm saying that my hypothesis will predict that Y is equal to 1 whenever X1 squared plus X2 squared is greater than or equal to 1. So, what does decision boundary look like? Well, if you were to plot the curve for X1 squared plus X2 squared equals 1. Some of you will that is the equation for a circle of radius 1 centered around the origin. So, that is my decision boundary. And everything outside the circle I'm going to predict as Y equals 1. So out here is, you know, my Y equals 1 region. I'm going to predict Y equals 1 out here. And inside the circle is where I'll predict Y is equal to 0. So, by adding these more complex or these polynomial terms to my features as well. I can get more complex decision boundaries that don't just try to separate the positive and negative examples of a straight line. I can get in this example a decision boundary that's a circle. Once again the decision boundary is a property not of the training set, but of the hypothesis and of the parameters. So long as we've given my parameter vector theta, that defines the decision boundary which is the circle. But the training set is not what we use to define decision boundary. The training set may be used to fit the parameters theta. We'll talk about how to do that later. But once you have the parameters theta, that is what defines the decision boundary. Let me put back the training set just for visualization. And finally, let's look at a more complex example. So can we come up with even more complex decision boundaries and this? If I have even higher order polynomial terms, so things like X1 squared, X1 squared X2, X1 squared X2 squared, and so on. If I have much higher order polynomials, then it's possible to show that you can get even more complex decision boundaries and logistic regression can be used to find the zero boundaries that may, for example, be an ellipse like that, or maybe with a different setting of the parameters, maybe you can get instead a different decision boundary that may even look like, you know, some funny shape like that. Or for even more complex examples you can also get decision boundaries that can look like, you know, more complex shapes like that. Where everything in here you predict Y equals 1, and everything outside you predict Y equals 0. So these higher order polynomial features you can get very complex decision boundaries. So with these visualizations, I hope that gives you a what's the range of hypothesis functions we can represent using the representation that we have for logistic regression. Now that we know what H of X can represent. What I'd like to do next in the following video is talk about how to automatically choose the parameters theta. So that given a training set we can automatically fit the parameters to our data.