1 00:00:00,025 --> 00:00:04,299 [MUSIC] 2 00:00:04,299 --> 00:00:07,165 Now that we've seen that the log can be an important function, 3 00:00:07,165 --> 00:00:10,899 when applied to the likelihood ensure that we're going to have to express in terms 4 00:00:10,899 --> 00:00:13,280 of the log likelihood computer gradient. 5 00:00:13,280 --> 00:00:15,770 This continues to be part of that very, 6 00:00:15,770 --> 00:00:20,150 very optional PhD level only derivation that we're doing here. 7 00:00:21,870 --> 00:00:25,170 Now let's revisit the likelihood function that we discussed and 8 00:00:25,170 --> 00:00:27,810 see how the log plays a role. 9 00:00:27,810 --> 00:00:33,410 So the key here is that the log of the sum is the sum of the logs that we discussed. 10 00:00:33,410 --> 00:00:39,273 And in our case, if you have the log of the product 11 00:00:39,273 --> 00:00:44,278 of i equals 1 through N of some function, 12 00:00:44,278 --> 00:00:49,855 let's call it Fi, then that is equal to the sum 13 00:00:49,855 --> 00:00:54,740 of i equals one through N of the log of Fi. 14 00:00:54,740 --> 00:00:57,800 So, the log of the product is the sum of the logs. 15 00:00:57,800 --> 00:01:01,840 And so, in our case, what we're going to get is the log 16 00:01:01,840 --> 00:01:06,720 likelihood function is going to be the sum of i = 1 through n 17 00:01:07,920 --> 00:01:13,410 of the log of the probability of yi, given xi and w. 18 00:01:14,720 --> 00:01:19,860 Now, if you're thinking about derivatives, you'll see exactly why the log was useful. 19 00:01:19,860 --> 00:01:23,850 The derivative of products is really a complicated thing, but once we've taken 20 00:01:23,850 --> 00:01:28,840 the log, the derivative of the sum is just the sum of the derivative. 21 00:01:28,840 --> 00:01:32,010 And so that's going to simplify all the math that we have to do. 22 00:01:32,010 --> 00:01:33,972 And that's the core reason that we take the log. 23 00:01:33,972 --> 00:01:37,110 There's a few other technical reasons, but that's one very important one. 24 00:01:37,110 --> 00:01:40,470 Okay, so that was trick number one. 25 00:01:40,470 --> 00:01:41,660 Take the log. 26 00:01:41,660 --> 00:01:45,990 Trick number two is to introduce indicator functions just like we showed 27 00:01:45,990 --> 00:01:47,980 in the derivative that we had before. 28 00:01:47,980 --> 00:01:51,530 So, if we take the log likelihood function. 29 00:01:51,530 --> 00:01:56,720 So, this is the sum of the log with the probability of yi given xi and w. 30 00:01:56,720 --> 00:01:59,610 That can be written as the sum of two terms. 31 00:02:00,960 --> 00:02:06,842 The indicator that yi is +1 has the probability of yi=+1, 32 00:02:06,842 --> 00:02:11,310 plus the indicative y is -1 has the probability of y=-1. 33 00:02:11,310 --> 00:02:14,591 So in other words, if it's, 34 00:02:14,591 --> 00:02:19,843 if yi=+1 then the first term comes to play and 35 00:02:19,843 --> 00:02:23,316 the second term becomes zero. 36 00:02:23,316 --> 00:02:28,410 But if YI is equal to minus one, 37 00:02:28,410 --> 00:02:34,560 then the first term becomes zero, then the second term becomes active. 38 00:02:34,560 --> 00:02:39,810 And so we see that because of this, we get exactly the equation above. 39 00:02:39,810 --> 00:02:43,490 But, the indicators are going to make our life a lot simpler 40 00:02:43,490 --> 00:02:46,520 throughout the derivatives and all of the operations that we need to do. 41 00:02:47,940 --> 00:02:48,875 Here is an interesting thing. 42 00:02:48,875 --> 00:02:53,749 So far we've only talked about the probability y = +1, but 43 00:02:53,749 --> 00:02:57,794 in this equation we have the probability y = -1. 44 00:02:57,794 --> 00:02:59,046 Interesting. 45 00:02:59,046 --> 00:03:03,229 [MUSIC]