[MUSIC] Okay, so we can't compute generalization
error, but we want some better measure of our predictive performance
than training error gives us. And so this takes us to
something called test error, and what test error is going to allow us
to do is approximate generalization error. And the way we're gonna do this
is by approximating the error, looking at houses that
aren't in our training set. So to do that,
we have to hold out some houses. So instead of including all these
colored houses in our training set, which is these colored houses
are our entire recorded data set, we're gonna shade out some of them,
these shaded gray houses and we're gonna make these into
what's called a test set. Okay.
So here we have houses that are not included in our training set, the training
set are the remaining colored houses here. And when we go to fit our models, we're just going to fit our
models on the training data set. But then when we go to assess
our performance of that model, we can look at these test houses,
and these are hopefully going to serve as a proxy of
everything out there in the world. So hopefully, our test data set is a good
measure of other houses that we might see, or at least in order to think of how
well a given model is performing. Okay, so
test error is gonna be our average loss computed over the houses
in our test data set. So formally, we write it as follows
where we have one over N test. N test are the number of
houses in our test data set times the sum of the loss defined
over those test set houses. But I wanna emphasize, and
this is really, really important, that the estimated parameters W hat
were fit on the training data set. Okay, so even though this function looks
very, very, very much like training error, the sum is over the test houses, but the function we're looking
at was fit on training data. Okay, so these parameters in this fitted
function never saw the test data. So just to illustrate this,
like in our previous example, we might think of fitting a quadratic
function through this data, where we're gonna minimize the residual
sum of squares on the training points, those blue circles,
to get our estimated parameters W hat. Then when we go to compute our test error,
which in this case again we're gonna use squared error as an example,
we're computing this error over the test points,
all these grey different circles here. So test error is 1 over N times
the sum of the difference between our true house sales prices and
our predicted price squared summing over all
houses in our test data set. Okay, so
this is where the difference arises, where this function was
fit with the blue circles. The one we're assessing, the performance,
we're looking at these grey circles. Okay, so let's summarize our measures of
error as a function of model complexity. And what we saw was
that our training error decreased with increasing
model complexity. So here, this is our training error. And in contrast, our generalization
error went down for some period of time. But then we started getting
to overly complex models that didn't generalize well, and the
generalization error started increasing. So here we have generalization error. Or true error. And what is our test error? Well, our test error is a noisy
approximation of generalization error. Because if our test data setting included
everything we might ever see in the world in proportion to how
likely it was to be seen, then that would be exactly
our generalization error. But of course, our test data set
is just some finite data set, and we're using it to approximate
generalization error, so it's gonna be some noisy
version of this curve here. So this is our test error. Okay, so test error is the thing
that we can actually compute. Generalization error is
the thing that we really want. [MUSIC]