1
00:00:00,000 --> 00:00:04,690
[MUSIC]

2
00:00:04,690 --> 00:00:08,356
Now let's turn to this important question
of how to choose the lambda tuning

3
00:00:08,356 --> 00:00:08,990
parameter.

4
00:00:10,240 --> 00:00:15,020
And what we mentioned last module was that
if we have some tuning parameter that

5
00:00:15,020 --> 00:00:19,200
controls model complexity,
then we can think for

6
00:00:19,200 --> 00:00:25,160
every value of that tuning parameter,
we can fit our model on our training data.

7
00:00:25,160 --> 00:00:30,425
Then we can assess the performance of
that fitted model on a validation set,

8
00:00:30,425 --> 00:00:35,458
and we can tabulate this for all values
of lambda that we might consider.

9
00:00:35,458 --> 00:00:40,975
And choose the specific model complexity
according to the error on this validation

10
00:00:40,975 --> 00:00:46,040
set, and then assess the performance
of the selected model on our test set.

11
00:00:48,710 --> 00:00:52,074
Well, now what we've seen is rich
regression is a special case

12
00:00:52,074 --> 00:00:55,376
of an objective where there's
a tuning parameter, lambda,

13
00:00:55,376 --> 00:00:57,608
that's controlling model complexity.

14
00:00:57,608 --> 00:01:01,601
And we'd like to see how we can
select this tuning parameter.

15
00:01:01,601 --> 00:01:06,953
So, of course we can use exactly this
procedure that we described last module.

16
00:01:06,953 --> 00:01:10,498
And that's assuming that we
have sufficient data to do

17
00:01:10,498 --> 00:01:13,048
this training validation test split.

18
00:01:13,048 --> 00:01:17,512
But now let's ask the question of, what if
we don't have enough data to reasonably do

19
00:01:17,512 --> 00:01:20,100
a divide into these three different sets?

20
00:01:20,100 --> 00:01:21,598
What can you do?

21
00:01:21,598 --> 00:01:25,863
And we're gonna talk about this in
the context of rich regression, but again,

22
00:01:25,863 --> 00:01:29,874
this holds for any tuning parameter
lambda that you might have in selecting

23
00:01:29,874 --> 00:01:32,810
between different model complexities.

24
00:01:32,810 --> 00:01:36,440
Or any other tuning parameter
controlling your model.

25
00:01:36,440 --> 00:01:39,620
Okay, so we're assuming that we're
starting with a smallish dataset.

26
00:01:41,290 --> 00:01:48,360
And as always, we need to break off some
test dataset that we're gonna hide, okay?

27
00:01:48,360 --> 00:01:53,120
We always need to have some test set
that's never touched during training or

28
00:01:53,120 --> 00:01:54,480
validation of our model.

29
00:01:55,710 --> 00:01:58,201
So now we took this smallish dataset and

30
00:01:58,201 --> 00:02:02,740
we have even a smaller dataset to do
our training and model validation.

31
00:02:04,110 --> 00:02:06,070
So how are we gonna do this?

32
00:02:06,070 --> 00:02:10,130
Well, we wanna do this in some
way that's a bit smarter than

33
00:02:10,130 --> 00:02:12,880
the naive approach of just
forming our validation set.

34
00:02:14,240 --> 00:02:19,246
So let's just remember this naive
approach, where we took whatever data

35
00:02:19,246 --> 00:02:24,751
was remaining after we split off our test
set and we defined some validation set.

36
00:02:24,751 --> 00:02:29,659
And a question is, in this case, when
we have just a small amount of data, so

37
00:02:29,659 --> 00:02:34,728
necessarily this validation set will
just be a small number of observations,

38
00:02:34,728 --> 00:02:39,646
is this sufficient for comparing between
different model complexities and

39
00:02:39,646 --> 00:02:42,250
accessing which one is best?

40
00:02:42,250 --> 00:02:44,162
Well, no, clearly the answer is no.

41
00:02:44,162 --> 00:02:46,055
We're saying that it's just a small set.

42
00:02:46,055 --> 00:02:50,801
It's not gonna be representative
of the space of things that we

43
00:02:50,801 --> 00:02:52,420
might see out there.

44
00:02:53,480 --> 00:02:56,000
Okay, so what can we do better?

45
00:02:56,000 --> 00:02:57,755
We're stuck with just this dataset.

46
00:02:58,980 --> 00:03:03,400
Well, did we have to use the last
set of tabulated observations as

47
00:03:03,400 --> 00:03:06,240
the observations to define
this validation set?

48
00:03:06,240 --> 00:03:12,278
No, I could of used the first few
observations, or next set of observations,

49
00:03:12,278 --> 00:03:16,498
or any random subset of
observations in this dataset.

50
00:03:16,498 --> 00:03:17,883
And a question is,

51
00:03:17,883 --> 00:03:22,570
which subset of observations
should I use as my validation set?

52
00:03:23,860 --> 00:03:30,700
And the answer, and this is the key
insight, is use all of the data subsets.

53
00:03:30,700 --> 00:03:32,320
Because if you're doing that,

54
00:03:32,320 --> 00:03:36,580
then you can think about averaging your
performance across these validation sets.

55
00:03:36,580 --> 00:03:41,748
And avoid any sensitivity you might have
to one specific choice of validation set

56
00:03:41,748 --> 00:03:46,692
that might give some strange numbers
because it just has a few observations.

57
00:03:46,692 --> 00:03:50,947
It doesn't give a good assessment of
comparison between different model

58
00:03:50,947 --> 00:03:51,915
complexities.

59
00:03:51,915 --> 00:03:55,679
[MUSIC]