1
00:00:00,626 --> 00:00:04,446
[MUSIC]

2
00:00:04,446 --> 00:00:07,770
So this is gonna be our Approach 1.

3
00:00:07,770 --> 00:00:12,910
And this is drawn here
on this 3D mesh plot,

4
00:00:12,910 --> 00:00:18,640
where that green surface is
the gradient at the minimum.

5
00:00:18,640 --> 00:00:22,650
And what we see is that's
where the gradient = 0.

6
00:00:22,650 --> 00:00:26,510
And that red dot is the, the optimal
point that we're gonna be looking at.

7
00:00:27,830 --> 00:00:30,380
Okay, so let's go ahead.

8
00:00:30,380 --> 00:00:32,790
Take this gradient, set it equal to zero.

9
00:00:32,790 --> 00:00:35,110
Solve for W0 and W1.

10
00:00:35,110 --> 00:00:39,270
Those are gonna be our estimates

11
00:00:39,270 --> 00:00:42,620
of our two parameters of our model
that define our fitted line.

12
00:00:42,620 --> 00:00:44,130
Remember, that's our goal.

13
00:00:44,130 --> 00:00:48,750
Okay, so I'm gonna take the top line and
I'm gonna do a little bit of algebra.

14
00:00:48,750 --> 00:00:50,940
I'm gonna do it quickly.

15
00:00:50,940 --> 00:00:53,320
And I'm gonna assume that you,
if you would like to,

16
00:00:53,320 --> 00:00:57,340
can go through and
verify that what I did is correct.

17
00:00:57,340 --> 00:01:00,860
But I'm gonna take the first line,
and I'm going to set it equal to 0.

18
00:01:00,860 --> 00:01:05,282
The top line, when you set it

19
00:01:05,282 --> 00:01:10,440
equal to 0, results in W hat 0,

20
00:01:10,440 --> 00:01:14,861
is equal to the sum of yi over N

21
00:01:14,861 --> 00:01:19,871
minus W1 hat, sum of Xi over N.

22
00:01:19,871 --> 00:01:25,510
And these sums go from i equal one to N,
just as they did here.

23
00:01:28,120 --> 00:01:31,780
And the reason I'm putting the hats
on are now, these are our solutions.

24
00:01:31,780 --> 00:01:35,220
These are our estimated
values of these parameters.

25
00:01:36,300 --> 00:01:41,919
And what we see is that our
estimate of the intersect for

26
00:01:41,919 --> 00:01:45,580
our regression line.

27
00:01:45,580 --> 00:01:47,240
Well it takes, what is this?

28
00:01:47,240 --> 00:01:50,965
This is our average house sales price.

29
00:01:58,766 --> 00:02:06,941
But we're not simply gonna set W0 equal
to the average house sales price,

30
00:02:06,941 --> 00:02:14,500
we're gonna subtract off our
estimate of the slope, of the line.

31
00:02:14,500 --> 00:02:15,920
That's W hat 1.

32
00:02:15,920 --> 00:02:18,200
And what is this term here?

33
00:02:18,200 --> 00:02:20,630
That's multiplying W hat 1.

34
00:02:20,630 --> 00:02:23,866
Well, this is our average

35
00:02:26,861 --> 00:02:31,340
Square feet of any house
in our training data set.

36
00:02:32,510 --> 00:02:39,740
Okay so, there's a nice intuitive
structure to our estimate for W hat zero.

37
00:02:39,740 --> 00:02:45,023
But again this is in terms of W hat 1,
so we have to provide

38
00:02:45,023 --> 00:02:50,087
another equation to actually
get at a solution, and so

39
00:02:50,087 --> 00:02:57,131
if we look at the bottom line of this
gradient the bottom term of this vector,

40
00:02:57,131 --> 00:03:02,305
shouldn't call it line I guess
I'll call it the top term

41
00:03:02,305 --> 00:03:07,853
of the gradient and
this is the bottom term of the gradient.

42
00:03:07,853 --> 00:03:12,730
If we solve, set it equal to 0,

43
00:03:12,730 --> 00:03:20,500
we're gonna get some of
yiXi-w hat sum Xi minus W,

44
00:03:20,500 --> 00:03:25,379
sorry, this should be W0 hat,

45
00:03:25,379 --> 00:03:29,736
W1 hat sum Xi squared = 0.

46
00:03:29,736 --> 00:03:33,727
And now what I'm gonna do
is I'm gonna take W0 hat,

47
00:03:33,727 --> 00:03:37,096
my equation for it and
I'm gonna plug it in.

48
00:03:41,136 --> 00:03:45,957
And so what I end up getting out,
is that W1 hat,

49
00:03:45,957 --> 00:03:51,860
once I Plug W0 hat in,
in terms of W1 and solve for W1 hat.

50
00:03:51,860 --> 00:03:58,019
I get W1 hat is equal to the sum of YiXi,

51
00:03:58,019 --> 00:04:02,871
minus sum Yi, sum Xi, over N,

52
00:04:02,871 --> 00:04:07,166
divided by sum Xi squared,

53
00:04:07,166 --> 00:04:13,650
minus sum Xi, sum of Xi divided by N.

54
00:04:13,650 --> 00:04:14,680
Okay.

55
00:04:14,680 --> 00:04:20,620
Anyway, the point is that it has a close
form pretty straightforward to go and

56
00:04:20,620 --> 00:04:24,260
compute what this is, and
what we see and wanna note

57
00:04:26,010 --> 00:04:30,770
that what we have to compute to compute
W hat 1 and then plug that in and

58
00:04:30,770 --> 00:04:36,290
compute W hat 0 is we need to
compute just a couple terms.

59
00:04:36,290 --> 00:04:42,831
We need to compute,
sum over all of our observations Yi,

60
00:04:42,831 --> 00:04:46,511
we need to compute our outputs,

61
00:04:46,511 --> 00:04:50,598
Yi sum over all of our inputs Xi, and

62
00:04:50,598 --> 00:04:56,595
then a few other terms that
are multipliers of this,

63
00:04:56,595 --> 00:04:59,483
of our input and output.

64
00:05:04,823 --> 00:05:08,335
So we need to compute just
four different terms.

65
00:05:08,335 --> 00:05:14,855
Plug them into these equations and we
get out what our W hat 0 and W hat 1 is.

66
00:05:14,855 --> 00:05:18,935
The optimal values that are minimizing
our residual sum of squares.

67
00:05:18,935 --> 00:05:23,868
The take home message here is that, one
way we can solve this optimization problem

68
00:05:23,868 --> 00:05:28,728
of minimizing residual sum of squares,
take the gradient set it equal to zero and

69
00:05:28,728 --> 00:05:29,961
this is the result.

70
00:05:29,961 --> 00:05:33,899
[MUSIC]