1 00:00:00,000 --> 00:00:04,690 [MUSIC] 2 00:00:04,690 --> 00:00:08,356 Now let's turn to this important question of how to choose the lambda tuning 3 00:00:08,356 --> 00:00:08,990 parameter. 4 00:00:10,240 --> 00:00:15,020 And what we mentioned last module was that if we have some tuning parameter that 5 00:00:15,020 --> 00:00:19,200 controls model complexity, then we can think for 6 00:00:19,200 --> 00:00:25,160 every value of that tuning parameter, we can fit our model on our training data. 7 00:00:25,160 --> 00:00:30,425 Then we can assess the performance of that fitted model on a validation set, 8 00:00:30,425 --> 00:00:35,458 and we can tabulate this for all values of lambda that we might consider. 9 00:00:35,458 --> 00:00:40,975 And choose the specific model complexity according to the error on this validation 10 00:00:40,975 --> 00:00:46,040 set, and then assess the performance of the selected model on our test set. 11 00:00:48,710 --> 00:00:52,074 Well, now what we've seen is rich regression is a special case 12 00:00:52,074 --> 00:00:55,376 of an objective where there's a tuning parameter, lambda, 13 00:00:55,376 --> 00:00:57,608 that's controlling model complexity. 14 00:00:57,608 --> 00:01:01,601 And we'd like to see how we can select this tuning parameter. 15 00:01:01,601 --> 00:01:06,953 So, of course we can use exactly this procedure that we described last module. 16 00:01:06,953 --> 00:01:10,498 And that's assuming that we have sufficient data to do 17 00:01:10,498 --> 00:01:13,048 this training validation test split. 18 00:01:13,048 --> 00:01:17,512 But now let's ask the question of, what if we don't have enough data to reasonably do 19 00:01:17,512 --> 00:01:20,100 a divide into these three different sets? 20 00:01:20,100 --> 00:01:21,598 What can you do? 21 00:01:21,598 --> 00:01:25,863 And we're gonna talk about this in the context of rich regression, but again, 22 00:01:25,863 --> 00:01:29,874 this holds for any tuning parameter lambda that you might have in selecting 23 00:01:29,874 --> 00:01:32,810 between different model complexities. 24 00:01:32,810 --> 00:01:36,440 Or any other tuning parameter controlling your model. 25 00:01:36,440 --> 00:01:39,620 Okay, so we're assuming that we're starting with a smallish dataset. 26 00:01:41,290 --> 00:01:48,360 And as always, we need to break off some test dataset that we're gonna hide, okay? 27 00:01:48,360 --> 00:01:53,120 We always need to have some test set that's never touched during training or 28 00:01:53,120 --> 00:01:54,480 validation of our model. 29 00:01:55,710 --> 00:01:58,201 So now we took this smallish dataset and 30 00:01:58,201 --> 00:02:02,740 we have even a smaller dataset to do our training and model validation. 31 00:02:04,110 --> 00:02:06,070 So how are we gonna do this? 32 00:02:06,070 --> 00:02:10,130 Well, we wanna do this in some way that's a bit smarter than 33 00:02:10,130 --> 00:02:12,880 the naive approach of just forming our validation set. 34 00:02:14,240 --> 00:02:19,246 So let's just remember this naive approach, where we took whatever data 35 00:02:19,246 --> 00:02:24,751 was remaining after we split off our test set and we defined some validation set. 36 00:02:24,751 --> 00:02:29,659 And a question is, in this case, when we have just a small amount of data, so 37 00:02:29,659 --> 00:02:34,728 necessarily this validation set will just be a small number of observations, 38 00:02:34,728 --> 00:02:39,646 is this sufficient for comparing between different model complexities and 39 00:02:39,646 --> 00:02:42,250 accessing which one is best? 40 00:02:42,250 --> 00:02:44,162 Well, no, clearly the answer is no. 41 00:02:44,162 --> 00:02:46,055 We're saying that it's just a small set. 42 00:02:46,055 --> 00:02:50,801 It's not gonna be representative of the space of things that we 43 00:02:50,801 --> 00:02:52,420 might see out there. 44 00:02:53,480 --> 00:02:56,000 Okay, so what can we do better? 45 00:02:56,000 --> 00:02:57,755 We're stuck with just this dataset. 46 00:02:58,980 --> 00:03:03,400 Well, did we have to use the last set of tabulated observations as 47 00:03:03,400 --> 00:03:06,240 the observations to define this validation set? 48 00:03:06,240 --> 00:03:12,278 No, I could of used the first few observations, or next set of observations, 49 00:03:12,278 --> 00:03:16,498 or any random subset of observations in this dataset. 50 00:03:16,498 --> 00:03:17,883 And a question is, 51 00:03:17,883 --> 00:03:22,570 which subset of observations should I use as my validation set? 52 00:03:23,860 --> 00:03:30,700 And the answer, and this is the key insight, is use all of the data subsets. 53 00:03:30,700 --> 00:03:32,320 Because if you're doing that, 54 00:03:32,320 --> 00:03:36,580 then you can think about averaging your performance across these validation sets. 55 00:03:36,580 --> 00:03:41,748 And avoid any sensitivity you might have to one specific choice of validation set 56 00:03:41,748 --> 00:03:46,692 that might give some strange numbers because it just has a few observations. 57 00:03:46,692 --> 00:03:50,947 It doesn't give a good assessment of comparison between different model 58 00:03:50,947 --> 00:03:51,915 complexities. 59 00:03:51,915 --> 00:03:55,679 [MUSIC]