[MUSIC] Now that we've seen that the log
can be an important function, when applied to the likelihood ensure that
we're going to have to express in terms of the log likelihood computer gradient. This continues to be part of that very, very optional PhD level only
derivation that we're doing here. Now let's revisit the likelihood
function that we discussed and see how the log plays a role. So the key here is that the log of the sum
is the sum of the logs that we discussed. And in our case,
if you have the log of the product of i equals 1 through N of some function, let's call it Fi,
then that is equal to the sum of i equals one through
N of the log of Fi. So, the log of the product
is the sum of the logs. And so, in our case,
what we're going to get is the log likelihood function is going to
be the sum of i = 1 through n of the log of the probability of yi,
given xi and w. Now, if you're thinking about derivatives,
you'll see exactly why the log was useful. The derivative of products is really
a complicated thing, but once we've taken the log, the derivative of the sum
is just the sum of the derivative. And so that's going to simplify
all the math that we have to do. And that's the core reason
that we take the log. There's a few other technical reasons,
but that's one very important one. Okay, so that was trick number one. Take the log. Trick number two is to introduce
indicator functions just like we showed in the derivative that we had before. So, if we take the log
likelihood function. So, this is the sum of the log with
the probability of yi given xi and w. That can be written as
the sum of two terms. The indicator that yi is +1
has the probability of yi=+1, plus the indicative y is -1
has the probability of y=-1. So in other words, if it's, if yi=+1 then the first
term comes to play and the second term becomes zero. But if YI is equal to minus one, then the first term becomes zero,
then the second term becomes active. And so we see that because of this,
we get exactly the equation above. But, the indicators are going
to make our life a lot simpler throughout the derivatives and
all of the operations that we need to do. Here is an interesting thing. So far we've only talked about
the probability y = +1, but in this equation we have
the probability y = -1. Interesting. [MUSIC]