1
00:00:00,102 --> 00:00:04,557
[MUSIC]

2
00:00:04,557 --> 00:00:08,370
>> Okay, let's talk about this in the
context of the bias variance trade-off.

3
00:00:08,370 --> 00:00:11,800
And what we saw is when
we had very large lambda,

4
00:00:11,800 --> 00:00:17,320
we had a solution with very high bias,
but low variance.

5
00:00:17,320 --> 00:00:20,900
And one way to see this is that, is
thinking about when we're cranking lambda

6
00:00:20,900 --> 00:00:25,630
all the way up to infinity, in that limit,
we get coefficients shrunk to be zero,

7
00:00:25,630 --> 00:00:29,110
and clearly that's a model with
high bias but low variance.

8
00:00:29,110 --> 00:00:32,610
It's completely low variance, it doesn't
change no matter what data you give me.

9
00:00:34,270 --> 00:00:37,910
On the other hand,
when we had very small lambda,

10
00:00:37,910 --> 00:00:42,070
we have a model that is low bias,
but high variance.

11
00:00:42,070 --> 00:00:46,400
And to see this think about setting lambda
to zero, in which case, we get out just

12
00:00:46,400 --> 00:00:53,230
our old solution, our old lee squares or
minimizing residual sum of squares fit.

13
00:00:53,230 --> 00:00:55,990
And there we see that for

14
00:00:55,990 --> 00:00:59,670
higher complexity models clearly you're
gonna have low bias but high variance.

15
00:01:01,600 --> 00:01:06,180
So what we see is this lambda
tuning parameter controls our model

16
00:01:06,180 --> 00:01:10,500
complexity and
controls this bias variance trade-off.

17
00:01:10,500 --> 00:01:13,740
Okay, so let's return to our
polynomial regression demo, but

18
00:01:13,740 --> 00:01:18,435
now using ridge regression and
see if we can ameliorate the issues of

19
00:01:18,435 --> 00:01:22,220
over-fitting as we vary
the choice of lambda.

20
00:01:22,220 --> 00:01:25,962
And so we're going to explore this
ridge regression solution for

21
00:01:25,962 --> 00:01:29,511
a couple different choices of
this lambda tuning parameter.

22
00:01:29,511 --> 00:01:33,579
[MUSIC]