1 00:00:00,626 --> 00:00:04,446 [MUSIC] 2 00:00:04,446 --> 00:00:07,770 So this is gonna be our Approach 1. 3 00:00:07,770 --> 00:00:12,910 And this is drawn here on this 3D mesh plot, 4 00:00:12,910 --> 00:00:18,640 where that green surface is the gradient at the minimum. 5 00:00:18,640 --> 00:00:22,650 And what we see is that's where the gradient = 0. 6 00:00:22,650 --> 00:00:26,510 And that red dot is the, the optimal point that we're gonna be looking at. 7 00:00:27,830 --> 00:00:30,380 Okay, so let's go ahead. 8 00:00:30,380 --> 00:00:32,790 Take this gradient, set it equal to zero. 9 00:00:32,790 --> 00:00:35,110 Solve for W0 and W1. 10 00:00:35,110 --> 00:00:39,270 Those are gonna be our estimates 11 00:00:39,270 --> 00:00:42,620 of our two parameters of our model that define our fitted line. 12 00:00:42,620 --> 00:00:44,130 Remember, that's our goal. 13 00:00:44,130 --> 00:00:48,750 Okay, so I'm gonna take the top line and I'm gonna do a little bit of algebra. 14 00:00:48,750 --> 00:00:50,940 I'm gonna do it quickly. 15 00:00:50,940 --> 00:00:53,320 And I'm gonna assume that you, if you would like to, 16 00:00:53,320 --> 00:00:57,340 can go through and verify that what I did is correct. 17 00:00:57,340 --> 00:01:00,860 But I'm gonna take the first line, and I'm going to set it equal to 0. 18 00:01:00,860 --> 00:01:05,282 The top line, when you set it 19 00:01:05,282 --> 00:01:10,440 equal to 0, results in W hat 0, 20 00:01:10,440 --> 00:01:14,861 is equal to the sum of yi over N 21 00:01:14,861 --> 00:01:19,871 minus W1 hat, sum of Xi over N. 22 00:01:19,871 --> 00:01:25,510 And these sums go from i equal one to N, just as they did here. 23 00:01:28,120 --> 00:01:31,780 And the reason I'm putting the hats on are now, these are our solutions. 24 00:01:31,780 --> 00:01:35,220 These are our estimated values of these parameters. 25 00:01:36,300 --> 00:01:41,919 And what we see is that our estimate of the intersect for 26 00:01:41,919 --> 00:01:45,580 our regression line. 27 00:01:45,580 --> 00:01:47,240 Well it takes, what is this? 28 00:01:47,240 --> 00:01:50,965 This is our average house sales price. 29 00:01:58,766 --> 00:02:06,941 But we're not simply gonna set W0 equal to the average house sales price, 30 00:02:06,941 --> 00:02:14,500 we're gonna subtract off our estimate of the slope, of the line. 31 00:02:14,500 --> 00:02:15,920 That's W hat 1. 32 00:02:15,920 --> 00:02:18,200 And what is this term here? 33 00:02:18,200 --> 00:02:20,630 That's multiplying W hat 1. 34 00:02:20,630 --> 00:02:23,866 Well, this is our average 35 00:02:26,861 --> 00:02:31,340 Square feet of any house in our training data set. 36 00:02:32,510 --> 00:02:39,740 Okay so, there's a nice intuitive structure to our estimate for W hat zero. 37 00:02:39,740 --> 00:02:45,023 But again this is in terms of W hat 1, so we have to provide 38 00:02:45,023 --> 00:02:50,087 another equation to actually get at a solution, and so 39 00:02:50,087 --> 00:02:57,131 if we look at the bottom line of this gradient the bottom term of this vector, 40 00:02:57,131 --> 00:03:02,305 shouldn't call it line I guess I'll call it the top term 41 00:03:02,305 --> 00:03:07,853 of the gradient and this is the bottom term of the gradient. 42 00:03:07,853 --> 00:03:12,730 If we solve, set it equal to 0, 43 00:03:12,730 --> 00:03:20,500 we're gonna get some of yiXi-w hat sum Xi minus W, 44 00:03:20,500 --> 00:03:25,379 sorry, this should be W0 hat, 45 00:03:25,379 --> 00:03:29,736 W1 hat sum Xi squared = 0. 46 00:03:29,736 --> 00:03:33,727 And now what I'm gonna do is I'm gonna take W0 hat, 47 00:03:33,727 --> 00:03:37,096 my equation for it and I'm gonna plug it in. 48 00:03:41,136 --> 00:03:45,957 And so what I end up getting out, is that W1 hat, 49 00:03:45,957 --> 00:03:51,860 once I Plug W0 hat in, in terms of W1 and solve for W1 hat. 50 00:03:51,860 --> 00:03:58,019 I get W1 hat is equal to the sum of YiXi, 51 00:03:58,019 --> 00:04:02,871 minus sum Yi, sum Xi, over N, 52 00:04:02,871 --> 00:04:07,166 divided by sum Xi squared, 53 00:04:07,166 --> 00:04:13,650 minus sum Xi, sum of Xi divided by N. 54 00:04:13,650 --> 00:04:14,680 Okay. 55 00:04:14,680 --> 00:04:20,620 Anyway, the point is that it has a close form pretty straightforward to go and 56 00:04:20,620 --> 00:04:24,260 compute what this is, and what we see and wanna note 57 00:04:26,010 --> 00:04:30,770 that what we have to compute to compute W hat 1 and then plug that in and 58 00:04:30,770 --> 00:04:36,290 compute W hat 0 is we need to compute just a couple terms. 59 00:04:36,290 --> 00:04:42,831 We need to compute, sum over all of our observations Yi, 60 00:04:42,831 --> 00:04:46,511 we need to compute our outputs, 61 00:04:46,511 --> 00:04:50,598 Yi sum over all of our inputs Xi, and 62 00:04:50,598 --> 00:04:56,595 then a few other terms that are multipliers of this, 63 00:04:56,595 --> 00:04:59,483 of our input and output. 64 00:05:04,823 --> 00:05:08,335 So we need to compute just four different terms. 65 00:05:08,335 --> 00:05:14,855 Plug them into these equations and we get out what our W hat 0 and W hat 1 is. 66 00:05:14,855 --> 00:05:18,935 The optimal values that are minimizing our residual sum of squares. 67 00:05:18,935 --> 00:05:23,868 The take home message here is that, one way we can solve this optimization problem 68 00:05:23,868 --> 00:05:28,728 of minimizing residual sum of squares, take the gradient set it equal to zero and 69 00:05:28,728 --> 00:05:29,961 this is the result. 70 00:05:29,961 --> 00:05:33,899 [MUSIC]