[MUSIC] Okay. Let's wrap up by talking about two
really important task when you're doing regression. And through this discussion, it's gonna motivate another important
concept of thinking about validation sets. So, the two important task in regression, is first we need to choose
a specific model complexity. So for example, when we're talking
about polynomial regression, what's the degree of that polynomial? And then for our selected model,
we assess its performance. And actually these two steps aren't
specific gesture regression. We're gonna see this in all different
aspects of machine learning, where we have to specify our model and then we need to
assess the performance of that model. So, what we're gonna talk about
in this portion of this module generalizes well beyond regression. And for this first task, where we're
talking about choosing the specific model. We're gonna talk about it in terms
of sum set of tuning parameters, lambda, which control
the model complexity. Again, and for example, lambda might
specify the degree of the polynomial and polynomial aggression. So, let's first talk about how we
can think about choosing lambda. And then for a given model specified
by lambda, a given model complexity, let's think about how we're gonna
assess the performance of that model. Well, one really naive approach is
to do what we've described before, where you take your data set and split
it into a training set and a test set. And then, what we're gonna do is for our model selection portion where we're
choosing the model complexity lambda. For every possible choice of lambda,
we're gonna estimate model parameters associated with that
lambda model on the training set. And the we're gonna test the performance
of that fitted model on the test set. And we're gonna tabulate that for
every lambda that we're considering. And we're gonna choose our tuning parameters as the ones that
minimize this test error. So, the ones that perform
best on the test data. And we're gonna call those
parameters lambda star. So, now I have my model. I have my specific degree of
polynomial that I'm gonna use. And I wanna go and assess
the performance of this specific model. And the way I'm gonna do this is
I'm gonna take my test data again. And I'm gonna say, well, okay, I know that test error is
an approximation of generalization error. So, I'm just gonna compute
the test error for this lambda star fitted model. And I'm gonna use that as my approximation
of the performance of this model. Well, what's the issue with this? Is this gonna perform well? No, it's really overly optimistic. So, this issue is just like what we saw
when we weren't dealing with this notion of choosing model complexity. We just assumed that we had a specific
model, like a specific degree polynomial. But we wanted to asses
the performance of the model. And the naive approach we
took there was saying, well, we fit the model to
the training data, and then we're gonna use training error to
assess the performance of the model. And we said, that was overly optimistic
because we were double dipping. We already used the data to fit our model. And then, so that error was not a good measure of
how we're gonna perform on new data. Well, it's exactly the same notion
here and let's walk through why. Most specifically, when we're thinking
about choosing our model complexity, we were using our test data to compare
between different lambda values. And we chose the lambda value that
minimized the error on that test data that performed the best there. So, you could think of
this as having fit lambda, this model complexity tuning parameter,
on the test data. And now,
we're thinking about using test error as a notion of approximating
how well we'll do on new data. But the issue is, unless our test data
represents everything we might see out there in the world,
that's gonna be way too optimistic. Because lambda was chosen, the model was
chosen, to do well on the test data and so that won't generalize
well to new observations. So, what's our solution? Well, we can just create
two test data sets. They won't both be called test sets, we're
gonna call one of them a validation set. So, we're gonna take our entire data set,
just to be clear. And now,
we're gonna split it into three data sets. One will be our training data set, one
will be what we call our validation set, and the other will be our test set. And then what we're gonna do is, we're
going to fit our model parameters always on our training data, for every given
model complexity that we're considering. But then we're gonna select our
model complexity as the model that performs best on the validation set
has the lowest validation error. And then we're gonna assess
the performance of that selected model on the test set. And we're gonna say that that test
error is now an approximation of our generalization error. Because that test set was never used in
either fitting our parameters, w hat, or selecting our model complexity lambda,
that other tuning parameter. So, that data was completely held out,
never touched, and it now forms a fair estimate
of our generalization error. So in summary,
we're gonna fit our model parameters for any given complexity on our training set. Then we're gonna, for every fitted
model and for every model complexity, we're gonna assess the performance and
tabulate this on our validation set. And we're gonna use that to select
the optimal set of tuning parameters lambda star. And then for that resulting model,
that w hat sub lambda star, we're gonna assess a notion of the
generalization error using our test set. And so a question,
is how can we think about doing the split between our training set,
validation set, and test set? And there's no hard and fast rule here, there's no one answer
that's the right answer. But typical splits that you see out there
are something like an 80-10-10 split. So, 80% of your data for training data,
10% t for validation, 10% for tests. Or another common split is 50%, 25%, 25%. But again, this is assuming that you have
enough data to do this type of split and still get reasonable estimates
of your model parameters, reasonable notions of how different
model complexities compare. Because you have a large
enough validation set, and you still have a large enough test set in order to assess the generalization
error of the resulting model. And if this isn't the case,
we're gonna talk about other methods that allow us to do these same
types of notions, but not with this type of hard division
between training, validation, and test. [MUSIC]