[MUSIC] We now seen how to express the probability of y equals plus 1, the probability of y equals minus 1, and when I plug those into the log likelihood before we take its gradient. There's still part of that very, very optional derivation that we're doing of the gradient of logistic regression. Now we have probability y=+1, probability y=-1. We can go ahead and plug those into our log likelihood function. And with the indicators and all that good stuff, let's see what happens. So I'm going to go ahead and plug in this definition of probability y=+1, y=-1 into log likelihood, It turns out the log likelihood is going to simplify to a pretty cool term and we're going to take the derivative of it, and derive the derivative that we're hoping for. So here we go. So I'm just going to first plug in the probability y=+1 into the first term. So we're going to have that the log likelihood function is the indicator that y is i = + 1 so we're only going to deal with one data point for now and then we'll get the sum over the data points later. So of the log of that ratio so 1/1+e- w transpose, h(xi), this is the particular data point we're dealing with. So that was easy, that was the first one. Now for the second term here, we need to plug in the probability y=-1. But it's a little bit annoying for a derivation that we have an indicator that yi is +1 and another indicator that yi is -1, so we're going to take this indicator that yi = -1 and substitute it with something else. So let's do a change of colors transformation here and just remind ourselves that the indicator that yi = -1, this takes value 1 when y=-1. Can be written as 1 minus the indicator that yi=+1. So if you think about it, when yi=-1, then the left side here is 1 and the right side is also 1. If yi = -1, the left side is 0 and the right side is 0. So let's plug that in, what we just learned. And so from the first term here, we get 1 minus indicator that yi = -1. And we're going to plug in the definition of the probability that y = -1. So that's log of e to the -w transpose h(xi) / 1 + e to the 1 + e to the -w transpose h(xi). Great, so now we have our two terms and now we're going to move a couple of things around and simplify the equations pretty significantly. So let's go ahead and do that. Okay, let's go back to our change of colors transformation, with dealing with the first terms. The first term is the red term, so let's go back to the red term. And let's see what the log of 1 / 1 + e to the -w transpose h(xi) looks like. And, the log of 1 over something turns out to be minus the log of something. And there's something here, is 1 + e to the -w, transpose h(xi). So plugging that in, we get the indicator that yi = +1 that multiplies minus the log of 1 + 8 to the -w transpose h. So we're going to write here log of 1 + e to the -w transpose h(xi), and put the minus sign out here. That was for the first term, so now let's look at that famous second term and expand it out. So let's go back to our blue. And so the coefficient here stayed the same. The 1 minus the indicator that yi is positive, so yi is positive, sorry about that, of the log of that ratio. So let's explore what the log of the ratio looks like. So, the log of e to the -w transpose h(xi) / 1+e to the -w transpose h(xi). And so what does that look like? That looks like it's the log of the ratio so it's the difference of the log so it's the log of e to the power of -w transpose h(xi) minus the log of 1 + e to the power of -w transpose h(xi). Let's note two things, first the second term is exactly what we had up here, so things are starting to look very similar. And what is the log of the first term? And here's another log trick. Log of e to the something, say e to the a is exactly equal to a. So in this case, the log of each of the -w transpose h is just -w transpose h(xi). So plug in that n, we get a coefficient that multiplies -w transpose h(xi) minus the same term as the other side, log of 1 + e to the -w transpose h(xi). Okay, going a little slowly, and now you can shake things around, move things, lots of stuff cancels out, I'm not going to go through that in detail, but you're welcome to do it. I'm just going to do a change of color transformation, go to purple which is a color I love, and just write out what the answer is. And the answer here becomes log likely can be written as for [INAUDIBLE] point as 1 minus the indicator that it is a positive example. So, yi =+1. That multiplies w transpose h(xi) minus the log of that crazy term 1+e to the -w transpose h(xi). So, we started from the log likelihood function, we went through a bunch of derivations and maths which you can explore if you want and we ended up with this much simpler form. And now we're going to take the simpler form and take its derivative. [MUSIC]