[MUSIC] We've now seen how to compute the log
likelihood, but before continuing, we're going to have to express
the probability of y = -1 because we've only so
far expressed the probability y = +1. This is still part of that very, very optional PhD level only
derivation of the gradient. So this is only for
those who are really interested. We need to take a moment to derive
what is the probability that y = -1 if the following up here is
the probability that y = +1? So let's just derive that. So the probability that y = -1 given x and
w is just one minus the probability
that y is equal to +1 given x and w. This is just because
probabilities add up to one. So, let's plug that in,
that's one minus one times one plus E to the minus W transpose h(x). I just plugged it in with
the definition of probability of y=+1. Now, if we multiply both elements
by 1+e to the -w transpose x, we get the probability
that y is equal to -1 is, so I'm multiplying both terms
by 1+e to the -w transpose h(x). So the one is 1 + e to
the -w transpose h(x) and then you get 80 minus one. So the whole thing here simplifies to e to the -w transpose h(x) divided by 1+e to the -w transpose h(x). Pretty cool. Very simple. Here we go. [MUSIC]