[MUSIC] Okay. Let's wrap up by talking about two really important task when you're doing regression. And through this discussion, it's gonna motivate another important concept of thinking about validation sets. So, the two important task in regression, is first we need to choose a specific model complexity. So for example, when we're talking about polynomial regression, what's the degree of that polynomial? And then for our selected model, we assess its performance. And actually these two steps aren't specific gesture regression. We're gonna see this in all different aspects of machine learning, where we have to specify our model and then we need to assess the performance of that model. So, what we're gonna talk about in this portion of this module generalizes well beyond regression. And for this first task, where we're talking about choosing the specific model. We're gonna talk about it in terms of sum set of tuning parameters, lambda, which control the model complexity. Again, and for example, lambda might specify the degree of the polynomial and polynomial aggression. So, let's first talk about how we can think about choosing lambda. And then for a given model specified by lambda, a given model complexity, let's think about how we're gonna assess the performance of that model. Well, one really naive approach is to do what we've described before, where you take your data set and split it into a training set and a test set. And then, what we're gonna do is for our model selection portion where we're choosing the model complexity lambda. For every possible choice of lambda, we're gonna estimate model parameters associated with that lambda model on the training set. And the we're gonna test the performance of that fitted model on the test set. And we're gonna tabulate that for every lambda that we're considering. And we're gonna choose our tuning parameters as the ones that minimize this test error. So, the ones that perform best on the test data. And we're gonna call those parameters lambda star. So, now I have my model. I have my specific degree of polynomial that I'm gonna use. And I wanna go and assess the performance of this specific model. And the way I'm gonna do this is I'm gonna take my test data again. And I'm gonna say, well, okay, I know that test error is an approximation of generalization error. So, I'm just gonna compute the test error for this lambda star fitted model. And I'm gonna use that as my approximation of the performance of this model. Well, what's the issue with this? Is this gonna perform well? No, it's really overly optimistic. So, this issue is just like what we saw when we weren't dealing with this notion of choosing model complexity. We just assumed that we had a specific model, like a specific degree polynomial. But we wanted to asses the performance of the model. And the naive approach we took there was saying, well, we fit the model to the training data, and then we're gonna use training error to assess the performance of the model. And we said, that was overly optimistic because we were double dipping. We already used the data to fit our model. And then, so that error was not a good measure of how we're gonna perform on new data. Well, it's exactly the same notion here and let's walk through why. Most specifically, when we're thinking about choosing our model complexity, we were using our test data to compare between different lambda values. And we chose the lambda value that minimized the error on that test data that performed the best there. So, you could think of this as having fit lambda, this model complexity tuning parameter, on the test data. And now, we're thinking about using test error as a notion of approximating how well we'll do on new data. But the issue is, unless our test data represents everything we might see out there in the world, that's gonna be way too optimistic. Because lambda was chosen, the model was chosen, to do well on the test data and so that won't generalize well to new observations. So, what's our solution? Well, we can just create two test data sets. They won't both be called test sets, we're gonna call one of them a validation set. So, we're gonna take our entire data set, just to be clear. And now, we're gonna split it into three data sets. One will be our training data set, one will be what we call our validation set, and the other will be our test set. And then what we're gonna do is, we're going to fit our model parameters always on our training data, for every given model complexity that we're considering. But then we're gonna select our model complexity as the model that performs best on the validation set has the lowest validation error. And then we're gonna assess the performance of that selected model on the test set. And we're gonna say that that test error is now an approximation of our generalization error. Because that test set was never used in either fitting our parameters, w hat, or selecting our model complexity lambda, that other tuning parameter. So, that data was completely held out, never touched, and it now forms a fair estimate of our generalization error. So in summary, we're gonna fit our model parameters for any given complexity on our training set. Then we're gonna, for every fitted model and for every model complexity, we're gonna assess the performance and tabulate this on our validation set. And we're gonna use that to select the optimal set of tuning parameters lambda star. And then for that resulting model, that w hat sub lambda star, we're gonna assess a notion of the generalization error using our test set. And so a question, is how can we think about doing the split between our training set, validation set, and test set? And there's no hard and fast rule here, there's no one answer that's the right answer. But typical splits that you see out there are something like an 80-10-10 split. So, 80% of your data for training data, 10% t for validation, 10% for tests. Or another common split is 50%, 25%, 25%. But again, this is assuming that you have enough data to do this type of split and still get reasonable estimates of your model parameters, reasonable notions of how different model complexities compare. Because you have a large enough validation set, and you still have a large enough test set in order to assess the generalization error of the resulting model. And if this isn't the case, we're gonna talk about other methods that allow us to do these same types of notions, but not with this type of hard division between training, validation, and test. [MUSIC]