1 00:00:00,567 --> 00:00:04,351 [MUSIC] 2 00:00:04,351 --> 00:00:07,939 We now seen how to express the probability of y equals plus 1, 3 00:00:07,939 --> 00:00:10,354 the probability of y equals minus 1, and 4 00:00:10,354 --> 00:00:14,428 when I plug those into the log likelihood before we take its gradient. 5 00:00:14,428 --> 00:00:16,408 There's still part of that very, 6 00:00:16,408 --> 00:00:21,340 very optional derivation that we're doing of the gradient of logistic regression. 7 00:00:22,980 --> 00:00:27,120 Now we have probability y=+1, probability y=-1. 8 00:00:27,120 --> 00:00:32,795 We can go ahead and plug those into our log likelihood function. 9 00:00:32,795 --> 00:00:37,113 And with the indicators and all that good stuff, let's see what happens. 10 00:00:38,530 --> 00:00:42,577 So I'm going to go ahead and plug in this definition of probability y=+1, 11 00:00:42,577 --> 00:00:47,011 y=-1 into log likelihood, It turns out the log likelihood is going to simplify 12 00:00:47,011 --> 00:00:50,673 to a pretty cool term and we're going to take the derivative of it, and 13 00:00:50,673 --> 00:00:53,274 derive the derivative that we're hoping for. 14 00:00:53,274 --> 00:00:54,160 So here we go. 15 00:00:55,610 --> 00:01:02,160 So I'm just going to first plug in the probability y=+1 into the first term. 16 00:01:02,160 --> 00:01:07,399 So we're going to have that the log likelihood function is the indicator 17 00:01:07,399 --> 00:01:12,188 that y is i = + 1 so we're only going to deal with one data point for 18 00:01:12,188 --> 00:01:17,050 now and then we'll get the sum over the data points later. 19 00:01:17,050 --> 00:01:22,373 So of the log of that ratio so 1/1+e- w transpose, 20 00:01:22,373 --> 00:01:28,875 h(xi), this is the particular data point we're dealing with. 21 00:01:28,875 --> 00:01:31,731 So that was easy, that was the first one. 22 00:01:31,731 --> 00:01:39,277 Now for the second term here, we need to plug in the probability y=-1. 23 00:01:39,277 --> 00:01:43,548 But it's a little bit annoying for a derivation that we have an indicator that 24 00:01:43,548 --> 00:01:47,621 yi is +1 and another indicator that yi is -1, so we're going to take this 25 00:01:47,621 --> 00:01:51,258 indicator that yi = -1 and substitute it with something else. 26 00:01:51,258 --> 00:01:56,295 So let's do a change of colors transformation here and 27 00:01:56,295 --> 00:02:01,661 just remind ourselves that the indicator that yi = -1, 28 00:02:01,661 --> 00:02:04,872 this takes value 1 when y=-1. 29 00:02:04,872 --> 00:02:09,628 Can be written as 1 minus 30 00:02:09,628 --> 00:02:15,530 the indicator that yi=+1. 31 00:02:15,530 --> 00:02:18,762 So if you think about it, 32 00:02:18,762 --> 00:02:25,540 when yi=-1, 33 00:02:25,540 --> 00:02:29,540 then the left side here is 1 and the right side is also 1. 34 00:02:29,540 --> 00:02:32,761 If yi = -1, the left side is 0 and the right side is 0. 35 00:02:32,761 --> 00:02:36,894 So let's plug that in, what we just learned. 36 00:02:36,894 --> 00:02:41,745 And so from the first term here, 37 00:02:41,745 --> 00:02:48,039 we get 1 minus indicator that yi = -1. 38 00:02:48,039 --> 00:02:55,725 And we're going to plug in the definition of the probability that y = -1. 39 00:02:55,725 --> 00:03:02,075 So that's log of e to the -w transpose 40 00:03:02,075 --> 00:03:07,234 h(xi) / 1 + e to the 1 + e to 41 00:03:07,234 --> 00:03:12,007 the -w transpose h(xi). 42 00:03:15,003 --> 00:03:19,664 Great, so now we have our two terms and now we're going to move a couple 43 00:03:19,664 --> 00:03:24,980 of things around and simplify the equations pretty significantly. 44 00:03:24,980 --> 00:03:26,150 So let's go ahead and do that. 45 00:03:28,080 --> 00:03:30,056 Okay, let's go back to our change of colors transformation, 46 00:03:30,056 --> 00:03:31,190 with dealing with the first terms. 47 00:03:31,190 --> 00:03:35,060 The first term is the red term, so let's go back to the red term. 48 00:03:35,060 --> 00:03:40,088 And let's see what the log of 1 / 1 49 00:03:40,088 --> 00:03:46,462 + e to the -w transpose h(xi) looks like. 50 00:03:48,160 --> 00:03:57,430 And, the log of 1 over something turns out to be minus the log of something. 51 00:03:57,430 --> 00:04:04,820 And there's something here, is 1 + e to the -w, transpose h(xi). 52 00:04:06,450 --> 00:04:12,424 So plugging that in, we get the indicator 53 00:04:12,424 --> 00:04:17,717 that yi = +1 that multiplies minus 54 00:04:17,717 --> 00:04:23,531 the log of 1 + 8 to the -w transpose h. 55 00:04:23,531 --> 00:04:28,217 So we're going to write here log of 1 56 00:04:28,217 --> 00:04:32,904 + e to the -w transpose h(xi), 57 00:04:32,904 --> 00:04:37,277 and put the minus sign out here. 58 00:04:39,042 --> 00:04:42,162 That was for the first term, so 59 00:04:42,162 --> 00:04:48,283 now let's look at that famous second term and expand it out. 60 00:04:48,283 --> 00:04:50,465 So let's go back to our blue. 61 00:04:50,465 --> 00:04:52,857 And so the coefficient here stayed the same. 62 00:04:52,857 --> 00:05:00,989 The 1 minus the indicator that yi is positive, 63 00:05:00,989 --> 00:05:07,493 so yi is positive, sorry about that, 64 00:05:07,493 --> 00:05:11,780 of the log of that ratio. 65 00:05:11,780 --> 00:05:14,576 So let's explore what the log of the ratio looks like. 66 00:05:14,576 --> 00:05:19,250 So, the log of e to the -w 67 00:05:19,250 --> 00:05:24,370 transpose h(xi) / 1+e 68 00:05:24,370 --> 00:05:30,164 to the -w transpose h(xi). 69 00:05:30,164 --> 00:05:32,904 And so what does that look like? 70 00:05:32,904 --> 00:05:38,019 That looks like it's the log of the ratio so 71 00:05:38,019 --> 00:05:41,998 it's the difference of the log so 72 00:05:41,998 --> 00:05:48,392 it's the log of e to the power of -w transpose h(xi) 73 00:05:48,392 --> 00:05:55,659 minus the log of 1 + e to the power of -w transpose h(xi). 74 00:05:55,659 --> 00:06:00,998 Let's note two things, first the second term is exactly what we had up here, 75 00:06:00,998 --> 00:06:04,134 so things are starting to look very similar. 76 00:06:04,134 --> 00:06:07,315 And what is the log of the first term? 77 00:06:07,315 --> 00:06:09,079 And here's another log trick. 78 00:06:09,079 --> 00:06:14,698 Log of e to the something, say e to the a is exactly equal to a. 79 00:06:14,698 --> 00:06:18,902 So in this case, the log of each of the -w 80 00:06:18,902 --> 00:06:23,490 transpose h is just -w transpose h(xi). 81 00:06:25,740 --> 00:06:30,880 So plug in that n, we get a coefficient 82 00:06:30,880 --> 00:06:36,184 that multiplies -w transpose h(xi) 83 00:06:36,184 --> 00:06:41,325 minus the same term as the other side, 84 00:06:41,325 --> 00:06:47,125 log of 1 + e to the -w transpose h(xi). 85 00:06:49,970 --> 00:06:54,890 Okay, going a little slowly, and now you can shake things around, move things, 86 00:06:54,890 --> 00:06:59,760 lots of stuff cancels out, I'm not going to go through that in detail, 87 00:06:59,760 --> 00:07:00,890 but you're welcome to do it. 88 00:07:00,890 --> 00:07:03,220 I'm just going to do a change of color transformation, 89 00:07:03,220 --> 00:07:08,429 go to purple which is a color I love, and just write out what the answer is. 90 00:07:09,480 --> 00:07:14,156 And the answer here becomes log likely can be written as for 91 00:07:14,156 --> 00:07:19,590 [INAUDIBLE] point as 1 minus 92 00:07:19,590 --> 00:07:24,540 the indicator that it is a positive example. 93 00:07:24,540 --> 00:07:28,636 So, yi =+1. 94 00:07:28,636 --> 00:07:34,408 That multiplies w transpose h(xi) 95 00:07:34,408 --> 00:07:40,179 minus the log of that crazy term 1+e 96 00:07:40,179 --> 00:07:45,029 to the -w transpose h(xi). 97 00:07:49,000 --> 00:07:53,594 So, we started from the log likelihood function, we went through 98 00:07:53,594 --> 00:07:58,269 a bunch of derivations and maths which you can explore if you want and 99 00:07:58,269 --> 00:08:01,073 we ended up with this much simpler form. 100 00:08:01,073 --> 00:08:05,259 And now we're going to take the simpler form and take its derivative. 101 00:08:05,259 --> 00:08:09,569 [MUSIC]