1 00:00:00,626 --> 00:00:04,584 [MUSIC] 2 00:00:04,584 --> 00:00:09,516 Okay, so our first approach is just gonna be to set our gradient = 0 and 3 00:00:09,516 --> 00:00:10,430 solve for W. 4 00:00:11,580 --> 00:00:13,160 But before we get there, 5 00:00:13,160 --> 00:00:18,050 let's just do a little linear algebra review of what identity matrices are. 6 00:00:18,050 --> 00:00:22,910 So the identity matrics is just the matrics analog of the number 1 and 7 00:00:24,120 --> 00:00:29,730 it can be defined in any dimensions so here we show just scalar, 8 00:00:29,730 --> 00:00:34,600 here we show a 2 by 2 matrics and what we see the identity matrix is it just 9 00:00:34,600 --> 00:00:38,900 places 1's along the diagonal and 0's on the octdiagonal. 10 00:00:38,900 --> 00:00:43,439 And that's true in any dimension up to having an N by N matrics. 11 00:00:43,439 --> 00:00:48,450 We have N1s on the diagonal, and every other term in the matrics is 0. 12 00:00:48,450 --> 00:00:53,410 So let's discuss a few fun facts about the identity matrics. 13 00:00:53,410 --> 00:00:57,290 Well if you take the identity matrics, and you multiply by a vector V 14 00:01:00,380 --> 00:01:04,850 and let's say that this identity matrics is sum N by N matrics and 15 00:01:04,850 --> 00:01:09,200 it's vectors in N by one matrics, you're just gonna get the vector V back. 16 00:01:10,330 --> 00:01:17,548 On the other hand, if you multiply this identity matrics by another matrics A, 17 00:01:17,548 --> 00:01:23,471 you're just gonna get, and so the A matrix is some N by M matrics, 18 00:01:23,471 --> 00:01:27,480 you're just gonna get that A matrics back. 19 00:01:29,320 --> 00:01:31,480 Then we can talk about a matrics inverse. 20 00:01:31,480 --> 00:01:36,333 In this case, we're talking about a square matrics, so 21 00:01:36,333 --> 00:01:39,375 A-1A=I are both N by N matrices. 22 00:01:39,375 --> 00:01:41,573 And so by definition, so 23 00:01:41,573 --> 00:01:47,542 let me just write that this was a matrics that we are multiplying by, 24 00:01:47,542 --> 00:01:51,960 and here, by definition of the matrics inverse. 25 00:01:59,540 --> 00:02:05,995 [SOUND] If we take A-1A, then the result is the identity matrics. 26 00:02:05,995 --> 00:02:10,327 That's just speak, like when we think about dividing scalars, so 27 00:02:10,327 --> 00:02:14,051 this inverse is like the matrics equivalent of division, 28 00:02:14,051 --> 00:02:18,160 so if you think of dividing a scalar A by A, you get the number 1. 29 00:02:18,160 --> 00:02:21,810 And so this is matrics analog of that. 30 00:02:21,810 --> 00:02:26,650 And then likewise again for some N by N matrices. 31 00:02:28,580 --> 00:02:31,490 If you multiply A by A inverse you also get the identity. 32 00:02:31,490 --> 00:02:35,910 And you can actually use the last few facts to prove this. 33 00:02:35,910 --> 00:02:40,870 You can simply think about post multiplying both sides by A. 34 00:02:40,870 --> 00:02:46,720 And we have A inverse A, which we know to be the identity matrics, 35 00:02:46,720 --> 00:02:49,940 and then we have A times the identity matrics. 36 00:02:49,940 --> 00:02:56,222 And, actually I should say, both of these results, whether you have V times Sorry, 37 00:02:56,222 --> 00:03:00,770 I should just say it in the matrics case. 38 00:03:00,770 --> 00:03:06,080 A times the identity, you'll likewise get out A. 39 00:03:06,080 --> 00:03:10,920 So here we end up with A equals, you have identity times A, 40 00:03:10,920 --> 00:03:16,100 A = A, which is a proof that this holds here. 41 00:03:16,100 --> 00:03:17,100 Okay. 42 00:03:17,100 --> 00:03:21,630 There are just some fun facts about the identity matrics as well as 43 00:03:21,630 --> 00:03:25,910 inverses that are gonna be useful in this module and 44 00:03:25,910 --> 00:03:28,350 probably in other modules we have later on, as well. 45 00:03:30,160 --> 00:03:32,610 And what we're gonna do now, now that we 46 00:03:32,610 --> 00:03:36,930 understand this identity matrics is simply rewrite the total cost that we had, or 47 00:03:36,930 --> 00:03:41,990 sorry the gradient of the total cost with this identity matrics. 48 00:03:41,990 --> 00:03:43,280 So this exactly the same. 49 00:03:43,280 --> 00:03:46,580 All we've done is we've replaced W. 50 00:03:46,580 --> 00:03:50,690 This W vector, by the identity times W. 51 00:03:51,950 --> 00:03:54,190 So these are equivalent. 52 00:03:55,620 --> 00:03:59,330 But this is gonna be helpful in our next derivation. 53 00:03:59,330 --> 00:04:03,210 Okay, so now we can take this equivalent form of the gradient of our total cost, 54 00:04:03,210 --> 00:04:04,340 and set it equal to zero. 55 00:04:05,490 --> 00:04:08,870 So the first thing we can do is just divide both sides by 56 00:04:08,870 --> 00:04:10,880 two to get rid of those twos. 57 00:04:10,880 --> 00:04:16,004 And then when we multiply 58 00:04:16,004 --> 00:04:20,396 out we get minus HT y- 59 00:04:20,396 --> 00:04:25,960 H-T Hw + lambda Iw =0. 60 00:04:25,960 --> 00:04:30,470 And when we're setting this equal to zero I'm gonna put the hat on the w, 61 00:04:30,470 --> 00:04:32,190 because that's what we're solving for. 62 00:04:33,510 --> 00:04:37,200 So then I can bring, sorry, there should be a plus sign here. 63 00:04:37,200 --> 00:04:38,790 Didn't do that right. 64 00:04:38,790 --> 00:04:45,217 So then I can bring this to the other side, 65 00:04:45,217 --> 00:04:50,920 I get HT H w hat + lambda Iw hat = HT y. 66 00:04:50,920 --> 00:04:56,590 And then what I see is I have w hat appearing in both of these terms. 67 00:04:56,590 --> 00:04:58,640 So I can factor it out. 68 00:04:58,640 --> 00:05:07,341 And I get (HT H + lambda I) times w hat. 69 00:05:08,970 --> 00:05:11,870 So this is the step where having that identity matrics was useful. 70 00:05:11,870 --> 00:05:14,410 So I hope it was worth everything on the last slide 71 00:05:14,410 --> 00:05:16,310 to get that one little punch line. 72 00:05:16,310 --> 00:05:22,611 Okay, so this = HTy, and the end result, 73 00:05:22,611 --> 00:05:26,638 if we use our little inverse 74 00:05:27,864 --> 00:05:31,716 from the previous slide, 75 00:05:31,716 --> 00:05:36,794 if I premultiply both sides by HTy H 76 00:05:36,794 --> 00:05:42,046 + lambda I inverse, then I get w hat 77 00:05:42,046 --> 00:05:48,790 is equal to (HT H + lambda I) -I HT y. 78 00:05:48,790 --> 00:05:49,890 Okay. 79 00:05:49,890 --> 00:05:54,284 And in particular, I'm gonna call this w hat ridge to indicate that 80 00:05:54,284 --> 00:05:58,769 it's the ridge regression solution for a specific value of lambda. 81 00:05:58,769 --> 00:06:03,069 [MUSIC]