1
00:00:00,750 --> 00:00:08,400
So here we are back at our
polynomial regression demo.

2
00:00:08,400 --> 00:00:13,075
And remember when we're just
doing these squares estimation.

3
00:00:13,075 --> 00:00:14,490
Let's just quickly scroll through this.

4
00:00:14,490 --> 00:00:18,250
Remember we had this data
generated from a sine function.

5
00:00:18,250 --> 00:00:25,060
And when we to fit a degree-2 polynomial,
things looked pretty reasonable.

6
00:00:26,425 --> 00:00:31,440
Degree-4 started looking a bit wigglier,
larger estimated coefficients and

7
00:00:31,440 --> 00:00:37,230
at degree-16 looked really wiggly and had
these massive, massive coefficients, okay.

8
00:00:37,230 --> 00:00:42,220
And now let's get to our ridge regression,
where we're just gonna take our

9
00:00:43,720 --> 00:00:46,680
polynomial regression function and
modify it.

10
00:00:46,680 --> 00:00:51,260
And in using Graph Lab Create it's
really simple to do the ridge regression

11
00:00:51,260 --> 00:00:58,090
modification because, as we mentioned
before, there's this l2 penalty input.

12
00:00:58,090 --> 00:01:00,580
To .linear_regression.

13
00:01:00,580 --> 00:01:01,200
And before,

14
00:01:01,200 --> 00:01:04,650
when we're doing just lee squares we
set that L2 penalty equal to zero.

15
00:01:04,650 --> 00:01:09,080
And this is this lambda value that
we're talking about in trading off

16
00:01:09,080 --> 00:01:12,260
between fit and model complexity.

17
00:01:12,260 --> 00:01:19,140
So here though, we're gonna actually
specify a value for this penalty.

18
00:01:19,140 --> 00:01:22,640
And that's the only modification that
we have to make in order to implement

19
00:01:22,640 --> 00:01:25,650
ridge regression using Graph Lab Create.

20
00:01:25,650 --> 00:01:27,980
But again in the assignments for

21
00:01:27,980 --> 00:01:32,470
this course you're gonna explore
implementing these methods yourself.

22
00:01:33,470 --> 00:01:40,100
Okay so let's go and define this
polynomial ridge regression function.

23
00:01:40,100 --> 00:01:41,760
And then we're just gonna go through and

24
00:01:41,760 --> 00:01:46,550
explore performing a fit of this really
high order polynomial, this 16th

25
00:01:46,550 --> 00:01:52,010
order polynomial that had very wiggly fit,
crazy coefficients associated with it.

26
00:01:52,010 --> 00:01:56,940
But now, looking at solving
the ridge regression objective for

27
00:01:56,940 --> 00:01:58,830
different values of lambda.

28
00:01:58,830 --> 00:02:02,820
So to start with, let's consider a really,
really small lambda value.

29
00:02:02,820 --> 00:02:07,510
So a very small penalty on
the two norm of the coefficients.

30
00:02:07,510 --> 00:02:12,142
And what we'd expect is that
the estimated fit would look very

31
00:02:12,142 --> 00:02:16,060
similar to the standard lee squares case.

32
00:02:16,060 --> 00:02:21,240
And if we look at the plot,
this figure looks very very similar,

33
00:02:21,240 --> 00:02:26,070
if I scroll up quickly, to the fit we
had doing just standard lee squares.

34
00:02:26,070 --> 00:02:30,630
So that checks out to what we know
should happen, and, likewise,

35
00:02:30,630 --> 00:02:34,620
the coefficients are still these
really really massive numbers.

36
00:02:34,620 --> 00:02:38,960
Okay, but what if we increase
the strength of our penalty.

37
00:02:38,960 --> 00:02:42,130
So let's consider a very large L2 penalty.

38
00:02:42,130 --> 00:02:46,230
Here we're considering a value of 100,
whereas in the case above we were

39
00:02:46,230 --> 00:02:50,150
considering a value of one-eighth
to the -25, so really really tiny.

40
00:02:52,910 --> 00:02:58,220
Well in this case,
we end up with much smaller coefficients.

41
00:02:58,220 --> 00:03:00,000
Actually they look really really small.

42
00:03:00,000 --> 00:03:02,440
So let's look at what the fit looks like.

43
00:03:03,440 --> 00:03:06,876
And we see a really, really smooth curve.

44
00:03:06,876 --> 00:03:11,940
And very flat, actually probably
way too simple of a description for

45
00:03:11,940 --> 00:03:13,600
what's really going on in the data.

46
00:03:13,600 --> 00:03:15,880
It doesn't seem to capture
this trend of the data.

47
00:03:17,290 --> 00:03:19,550
The value's increasing and
then decreasing.

48
00:03:19,550 --> 00:03:22,690
It just gets a constant fit
followed by a decrease.

49
00:03:22,690 --> 00:03:26,696
So, this seems to be under-fit and so

50
00:03:26,696 --> 00:03:31,660
as we expect, what we have is that
when lambda is really really small

51
00:03:31,660 --> 00:03:35,720
we get something similar to our lee
square solution and when lambda becomes

52
00:03:35,720 --> 00:03:40,130
really really large we start approaching
all the coefficients going to 0.

53
00:03:41,680 --> 00:03:45,470
Okay so
now what we're gonna do is look at the fit

54
00:03:45,470 --> 00:03:50,450
as a function of a series of different
lambda values going from our

55
00:03:50,450 --> 00:03:53,140
1e to the minus 25 all the way
up to the value of 100.

56
00:03:53,140 --> 00:03:58,120
But looking at some other intermediate
values as well to look at what the fit and

57
00:03:58,120 --> 00:04:01,590
coefficients look like
as we increase lambda.

58
00:04:01,590 --> 00:04:05,480
So we're starting with these crazy,
crazy large values.

59
00:04:05,480 --> 00:04:09,070
By the time we're at 1e to the -10 for
lambda,

60
00:04:09,070 --> 00:04:13,760
the values have decreased by two orders of
magnitude so have times 10 to the 4th now.

61
00:04:15,500 --> 00:04:16,922
Then we keep increasing lambda.

62
00:04:16,922 --> 00:04:18,220
1e to the -6.

63
00:04:18,220 --> 00:04:21,070
And we get values on
the order of hundreds for

64
00:04:21,070 --> 00:04:25,890
our coefficients, so
in terms of reasonability

65
00:04:25,890 --> 00:04:30,590
of these values I'd say that they start
looking a little bit more realistic.

66
00:04:30,590 --> 00:04:32,690
And then we keep going and

67
00:04:32,690 --> 00:04:37,390
you see that the value of the coefficients
keep decreasing, and when

68
00:04:37,390 --> 00:04:42,510
we get to this value of lambda that's 100
we get these really small coefficients.

69
00:04:42,510 --> 00:04:47,980
But now lets look at what the fits are for
these different lambda values.

70
00:04:47,980 --> 00:04:50,610
And here's the plot that
we've been showing before for

71
00:04:50,610 --> 00:04:52,490
this really small lambda.

72
00:04:52,490 --> 00:04:57,480
Increasing the lambda a bit smoother fit,
still pretty wiggly and crazy,

73
00:04:57,480 --> 00:05:01,040
especially on these boundary points.

74
00:05:01,040 --> 00:05:03,790
Increase lambda more,
things start looking better.

75
00:05:03,790 --> 00:05:09,870
When we get to 1e to the -3,
this looks pretty good.

76
00:05:09,870 --> 00:05:14,750
Especially here, it's hard to tell whether
the function should be going up or down.

77
00:05:15,830 --> 00:05:20,179
I want to emphasize that app boundaries
where you have few observations,

78
00:05:21,380 --> 00:05:26,430
it's very hard to control the fit so
we believe much more the fit in

79
00:05:26,430 --> 00:05:30,310
intermediate regions of our x
range where we have observations.

80
00:05:31,640 --> 00:05:33,930
Okay but then we get to this
really large lambda and

81
00:05:33,930 --> 00:05:37,760
we see that clearly we're over
smoothing across the data.

82
00:05:39,450 --> 00:05:44,225
So a natural question is, out of all these
possible lambda values we might consider,

83
00:05:44,225 --> 00:05:48,530
and all the associated fits,
which is the one that we should use for

84
00:05:48,530 --> 00:05:50,330
forming our predictions?

85
00:05:50,330 --> 00:05:53,270
Well, it would be really nice if there
were some automatic procedure for

86
00:05:53,270 --> 00:05:56,450
selecting this lambda value instead
of me having to go through,

87
00:05:56,450 --> 00:05:59,600
specify a large set of lambdas, look at
the coefficients, look at the fit, and

88
00:05:59,600 --> 00:06:04,170
somehow make some judgment call
about which one I want to use.

89
00:06:04,170 --> 00:06:07,830
Well, the good news is that there is
a way to automatically choose lambda.

90
00:06:07,830 --> 00:06:11,950
And this is something we're gonna
discuss later in this module.

91
00:06:11,950 --> 00:06:15,030
So one method that we're gonna
talk about is something called

92
00:06:15,030 --> 00:06:17,170
leave one out cross validation.

93
00:06:17,170 --> 00:06:23,350
And what leave one out cross validation
does is it approximates, so minimizing

94
00:06:23,350 --> 00:06:27,150
this leave one out cross-validation
error that we're gonna talk about,

95
00:06:27,150 --> 00:06:32,870
approximates minimizing the average
mean squared error in our predictions.

96
00:06:32,870 --> 00:06:37,250
So, what we're gonna
do here is we're gonna

97
00:06:37,250 --> 00:06:42,196
define this leave one out cross-validation
function and then apply it to our data.

98
00:06:42,196 --> 00:06:47,680
And, this leave one

99
00:06:47,680 --> 00:06:52,400
out cross validation function, you're not
gonna understand what's going on here yet.

100
00:06:52,400 --> 00:06:54,690
But you will by the end of this module.

101
00:06:54,690 --> 00:06:57,550
You'll be able to implement
this method yourself.

102
00:06:57,550 --> 00:07:01,510
But what it's doing is it's looking at

103
00:07:01,510 --> 00:07:06,900
prediction error of different lambda
values and then choosing one to minimize.

104
00:07:06,900 --> 00:07:10,700
But of course we're not looking
at that on the training error or

105
00:07:10,700 --> 00:07:14,680
on the, sorry on the training set or the
test set, we're using a validation set but

106
00:07:14,680 --> 00:07:15,870
in a very specific way.

107
00:07:17,688 --> 00:07:22,610
Okay, so now that we've applied this

108
00:07:22,610 --> 00:07:27,590
leave one out function to our
data in some set of specified

109
00:07:27,590 --> 00:07:32,710
penalty values, we can look at
what the plot of this leave one

110
00:07:32,710 --> 00:07:37,450
out cross validation error looks like as a
function of our considered lambda values.

111
00:07:37,450 --> 00:07:41,570
And in this case, we actually
see a curve that's pretty flat.

112
00:07:41,570 --> 00:07:43,480
In a bunch of regions.

113
00:07:43,480 --> 00:07:46,910
And what this means is
that our fits are not so

114
00:07:46,910 --> 00:07:49,420
sensitive to those choice
of lambda in these regions.

115
00:07:49,420 --> 00:07:55,130
But there is some minimum and we can
figure out what that minimum is here.

116
00:07:55,130 --> 00:07:59,160
So here we're just selecting
the lambda that has the lowest

117
00:07:59,160 --> 00:08:01,730
cross validation error.

118
00:08:01,730 --> 00:08:05,350
And then we're gonna fit our polynomial

119
00:08:05,350 --> 00:08:09,810
ridge regression model using
that specific lambda value.

120
00:08:09,810 --> 00:08:12,390
And we're printing our coefficients and

121
00:08:12,390 --> 00:08:15,420
what you see is we have
very reasonable numbers.

122
00:08:15,420 --> 00:08:21,640
Things on the order of 1, .2, .5, and
let's look at the associated fit.

123
00:08:21,640 --> 00:08:24,170
And things look really nice in this case.

124
00:08:24,170 --> 00:08:30,700
So, there is a really nice trend
throughout most of the range of x.

125
00:08:30,700 --> 00:08:34,410
The only place that things look a little
bit crazy is out here in the boundary.

126
00:08:34,410 --> 00:08:38,694
But again, at this boundary region we
actually don't have any data to really pin

127
00:08:38,694 --> 00:08:39,837
down this function.

128
00:08:39,837 --> 00:08:43,146
So, considering it's
a 16 order polynomial,

129
00:08:43,146 --> 00:08:46,615
we're shrinking coefficients but
we don't really

130
00:08:46,615 --> 00:08:51,047
have much information about what
the function should do out here.

131
00:08:51,047 --> 00:08:55,652
But what we've seen is that this leave
one out cross validation technique

132
00:08:55,652 --> 00:08:59,664
really nicely selects a lambda
value that provides a good fit and

133
00:08:59,664 --> 00:09:03,546
automatically does this balance
of bias and variance for us.

134
00:09:03,546 --> 00:09:08,199
[MUSIC]