[MUSIC] We've now seen how to compute the log likelihood, but before continuing, we're going to have to express the probability of y = -1 because we've only so far expressed the probability y = +1. This is still part of that very, very optional PhD level only derivation of the gradient. So this is only for those who are really interested. We need to take a moment to derive what is the probability that y = -1 if the following up here is the probability that y = +1? So let's just derive that. So the probability that y = -1 given x and w is just one minus the probability that y is equal to +1 given x and w. This is just because probabilities add up to one. So, let's plug that in, that's one minus one times one plus E to the minus W transpose h(x). I just plugged it in with the definition of probability of y=+1. Now, if we multiply both elements by 1+e to the -w transpose x, we get the probability that y is equal to -1 is, so I'm multiplying both terms by 1+e to the -w transpose h(x). So the one is 1 + e to the -w transpose h(x) and then you get 80 minus one. So the whole thing here simplifies to e to the -w transpose h(x) divided by 1+e to the -w transpose h(x). Pretty cool. Very simple. Here we go. [MUSIC]