1 00:00:00,000 --> 00:00:04,444 [MUSIC] 2 00:00:04,444 --> 00:00:08,771 So the first approach that we're gonna talk about is just a closed form solution 3 00:00:08,771 --> 00:00:13,880 where we take our gradient, and simply set it equal to zero, and solve for w. 4 00:00:13,880 --> 00:00:17,210 Just like we did in the simple regression case. 5 00:00:17,210 --> 00:00:21,070 Okay, here's my gradient of our residual sum of squares, and 6 00:00:21,070 --> 00:00:22,750 I've set it equal to zero. 7 00:00:22,750 --> 00:00:24,930 And now let's solve for w. 8 00:00:24,930 --> 00:00:33,250 So, let's do out this multiplication here where we're gonna get -2H transpose, y. 9 00:00:33,250 --> 00:00:39,000 And then we're gonna get a +2H transpose, 10 00:00:39,000 --> 00:00:42,880 H times w, and we're setting this equal to zero. 11 00:00:44,000 --> 00:00:49,690 And so these matrix multiplies act very similarly to if these had been scalars, 12 00:00:49,690 --> 00:00:51,580 but we have to keep the order. 13 00:00:51,580 --> 00:00:53,020 The order is very important. 14 00:00:53,020 --> 00:01:00,060 We couldn't have switched it around to be y h transposed, okay. 15 00:01:00,060 --> 00:01:03,700 So, now let's solve this equation. 16 00:01:03,700 --> 00:01:08,050 First of all, the two cancels out, we can just divide both sides by two. 17 00:01:08,050 --> 00:01:11,660 And so what I'm gonna end up with is h transpose 18 00:01:12,780 --> 00:01:17,950 hw equals, I bring this to the other side, h transpose y. 19 00:01:17,950 --> 00:01:26,200 Then I'm gonna multiply both sides by h transpose h inverse. 20 00:01:26,200 --> 00:01:29,295 So H transpose Hw and now I'm gonna multiple, 21 00:01:29,295 --> 00:01:32,830 pre-multiply the other said by the same. 22 00:01:32,830 --> 00:01:41,700 H transpose H inverse, H transpose y and so this is a little aside. 23 00:01:41,700 --> 00:01:46,660 What is a matrix A inverse times the same matrix a. 24 00:01:46,660 --> 00:01:52,040 Well, that's the definition of, 25 00:01:52,040 --> 00:01:56,000 so, by definition of a matrix inverse that is the identity matrix. 26 00:01:57,180 --> 00:02:02,760 And another aside, so that's aside number one another aside if I take 27 00:02:02,760 --> 00:02:07,850 a the identity matrix and multiply by any vector, 28 00:02:07,850 --> 00:02:13,160 so I'll call the vector v, I just get v back. 29 00:02:13,160 --> 00:02:17,620 Or if I take the matrix and multiply it by any matrix, big V, 30 00:02:17,620 --> 00:02:20,020 I'm gonna get that matrix back. 31 00:02:21,360 --> 00:02:26,490 Okay, so what we see if we apply these two identities here is that together, 32 00:02:27,840 --> 00:02:33,110 these two terms, h transpose h is like our big A matrix here. 33 00:02:33,110 --> 00:02:37,750 So we have a matrix times it's inverse, this is just gonna be the identity. 34 00:02:37,750 --> 00:02:42,240 Then we have the identity matrix times a vector w, 35 00:02:42,240 --> 00:02:44,670 that's just gonna be the w vector. 36 00:02:44,670 --> 00:02:49,800 So we have w, and let me put hats on, because remember, 37 00:02:49,800 --> 00:02:56,530 that once we set this equal to zero and solve for w this is our 38 00:02:56,530 --> 00:03:01,570 estimated set of coefficients so I'm going to put a hat on w and 39 00:03:01,570 --> 00:03:08,960 what we see is w hat is simply equal to h transpose h inverse HTy. 40 00:03:08,960 --> 00:03:12,640 So this is one of those aha moments. 41 00:03:13,770 --> 00:03:15,980 Maybe you missed it. 42 00:03:15,980 --> 00:03:21,140 Cuz aha moment is, we have a whole collection of different parameters, 43 00:03:21,140 --> 00:03:22,500 w0 all the way up to wd. 44 00:03:22,500 --> 00:03:26,300 These are the things multiplying all the features we're using in our multiple 45 00:03:26,300 --> 00:03:27,980 regression model. 46 00:03:27,980 --> 00:03:33,590 And in one line I've written the solution to the fit. 47 00:03:33,590 --> 00:03:39,028 I've fit all of w 0, w 1, all the way up to D just by doing this matrix multiply. 48 00:03:39,028 --> 00:03:40,770 Okay? 49 00:03:40,770 --> 00:03:43,890 So this. 50 00:03:43,890 --> 00:03:48,005 Motivates why we went through all this work to write things in this matrix 51 00:03:48,005 --> 00:03:52,054 notation because it allows me to have this nice closed form solution for 52 00:03:52,054 --> 00:03:54,560 all my parameters written very compactly. 53 00:03:54,560 --> 00:03:58,539 [MUSIC]