[MUSIC] Okay. Well, now let's turn to this third
component which is a variance. And what variance is gonna say is, how different can my specific fits to
a given data set be from one another, as I'm looking at different
possible data sets? And in this case, when we are looking
at just this constant model, we showed by that early picture
where I drew points that were mainly above the true relationship
and the points mainly below, that the actual resulting
fits didn't vary very much. And when you look at the space
of all possible observations, you see that the fits, they're fairly
similar, they're fairly stable. And so, when you look at
the variation in these fits, which I'm drawing with
these grey bars here. We see that they don't vary very much. So, for this low complexity model, we see that there's low variance. So, to summarize what this variance is
saying is, how much can the fits vary? And if they could vary dramatically
from one data set to the other, then you would have very
erratic predictions. Your prediction would just be
sensitive to what data set you got. So, that would be a source of
error in your predictions. And to see this, we can start
looking at high-complexity models. So in particular,
let's look at this data set again. And now, let's fit some
high-order polynomial to it. So, that's some fit shown here. And now,
let's take again this same data set. But let's choose two points, which I'm
gonna highlight as these pink circles. And let's just move them a little bit. So, out of this whole data set,
I've just moved two observations and not too dramatically, but
I get a dramatically different fit. So then, when I think about looking
over all possible data sets I might get, I might get some crazy set of curves. There is an average curve. And in this case, the average curve
is actually pretty well behaved. Because this wild,
wiggly curve is at any point, equally, likely to have been wild above,
or wild below. So, on average over all data sets, it's
actually a fairly smooth reasonable curve. But if I look at the variation
between these fits, it's really large. So, what we're saying is that
high-complexity models have high variance. On the other, if I look at the bias
of this model, so here again, I'm showing this average fit which
was this fairly well behaved curve. And match pretty well to the true
relationship between square feet and house value,
because my model is really flexible. So on average, it was able to fit pretty
precisely that true relationship. So, these high-complexities
models have low bias. So, we can now talk about
this bias-variance tradeoff. So, in particular,
we're gonna plot bias and variance as a function
of model complexity. And so, what we saw in
the past slides is that as our model complexity increases,
our bias decreases. Because we can better and better approximate the true
relationship between x and y. So, this curve here is our bias curve. On the other hand, variance increases. So, our very simple model
had very low variance, and the high-complexity
models had high variance. So, this is a picture of our variance. And so, what we see is there's this
natural tradeoff between bias and variance. And one way to summarize this is something
that's called mean squared error. And so, mean squared error,
which if you watch the optional videos that go into all these
concepts more in depth. You'll hear a lot more about mean
squared error and a formal definition, or the derivation of this. But mean squared error is simply
the sum of bias squared plus variance. Okay. I guess I'll write out
variance to be very clear. So, this is my little cartoon
of bias squared plus variance. This is my mean squared error curve. And machine learning is all about this
tradeoff between bias and variance. We're gonna see this again and
again in this course. And we're gonna see it
throughout the specialization. And the goal is finding this sweet spot. This is the sweet spot where
we get our minimum error, the minimum contribution of bias and
variance, to our prediction errors. So, not sweet, sweet. It is sweet, sweet, but
what I'm trying to write is sweet spot. And this is what we'd love to get at. That's the model
complexity that we'd want. But just like with generalization error,
so I'm gonna write this down
with generalization error. Can we compute this? So, think about that while I'm writing. We cannot compute bias and variance, and less mean squared error. And why? Well, the reason is because just
like with generalization error, they were defined in terms
of the true function. Well, bias was defined very
explicitly in terms of the relationship relative
to the true function. And when we think about defining variance, we have to average over all possible data
sets, and the same was true for bias too. But all possible data sets of size n,
we could have gotten from the world, and we just don't know what that is. So, we can't compute these things exactly. But throughout the rest of this course,
we're gonna look at ways to optimize this tradeoff between bias and
variance in a practical way. [MUSIC]