[MUSIC] Okay. Well, now let's turn to this third component which is a variance. And what variance is gonna say is, how different can my specific fits to a given data set be from one another, as I'm looking at different possible data sets? And in this case, when we are looking at just this constant model, we showed by that early picture where I drew points that were mainly above the true relationship and the points mainly below, that the actual resulting fits didn't vary very much. And when you look at the space of all possible observations, you see that the fits, they're fairly similar, they're fairly stable. And so, when you look at the variation in these fits, which I'm drawing with these grey bars here. We see that they don't vary very much. So, for this low complexity model, we see that there's low variance. So, to summarize what this variance is saying is, how much can the fits vary? And if they could vary dramatically from one data set to the other, then you would have very erratic predictions. Your prediction would just be sensitive to what data set you got. So, that would be a source of error in your predictions. And to see this, we can start looking at high-complexity models. So in particular, let's look at this data set again. And now, let's fit some high-order polynomial to it. So, that's some fit shown here. And now, let's take again this same data set. But let's choose two points, which I'm gonna highlight as these pink circles. And let's just move them a little bit. So, out of this whole data set, I've just moved two observations and not too dramatically, but I get a dramatically different fit. So then, when I think about looking over all possible data sets I might get, I might get some crazy set of curves. There is an average curve. And in this case, the average curve is actually pretty well behaved. Because this wild, wiggly curve is at any point, equally, likely to have been wild above, or wild below. So, on average over all data sets, it's actually a fairly smooth reasonable curve. But if I look at the variation between these fits, it's really large. So, what we're saying is that high-complexity models have high variance. On the other, if I look at the bias of this model, so here again, I'm showing this average fit which was this fairly well behaved curve. And match pretty well to the true relationship between square feet and house value, because my model is really flexible. So on average, it was able to fit pretty precisely that true relationship. So, these high-complexities models have low bias. So, we can now talk about this bias-variance tradeoff. So, in particular, we're gonna plot bias and variance as a function of model complexity. And so, what we saw in the past slides is that as our model complexity increases, our bias decreases. Because we can better and better approximate the true relationship between x and y. So, this curve here is our bias curve. On the other hand, variance increases. So, our very simple model had very low variance, and the high-complexity models had high variance. So, this is a picture of our variance. And so, what we see is there's this natural tradeoff between bias and variance. And one way to summarize this is something that's called mean squared error. And so, mean squared error, which if you watch the optional videos that go into all these concepts more in depth. You'll hear a lot more about mean squared error and a formal definition, or the derivation of this. But mean squared error is simply the sum of bias squared plus variance. Okay. I guess I'll write out variance to be very clear. So, this is my little cartoon of bias squared plus variance. This is my mean squared error curve. And machine learning is all about this tradeoff between bias and variance. We're gonna see this again and again in this course. And we're gonna see it throughout the specialization. And the goal is finding this sweet spot. This is the sweet spot where we get our minimum error, the minimum contribution of bias and variance, to our prediction errors. So, not sweet, sweet. It is sweet, sweet, but what I'm trying to write is sweet spot. And this is what we'd love to get at. That's the model complexity that we'd want. But just like with generalization error, so I'm gonna write this down with generalization error. Can we compute this? So, think about that while I'm writing. We cannot compute bias and variance, and less mean squared error. And why? Well, the reason is because just like with generalization error, they were defined in terms of the true function. Well, bias was defined very explicitly in terms of the relationship relative to the true function. And when we think about defining variance, we have to average over all possible data sets, and the same was true for bias too. But all possible data sets of size n, we could have gotten from the world, and we just don't know what that is. So, we can't compute these things exactly. But throughout the rest of this course, we're gonna look at ways to optimize this tradeoff between bias and variance in a practical way. [MUSIC]