[MUSIC] So having finished the preceding modules, I'm feeling pretty confident that I come in, I can specify a model, and I can also specify an algorithm for how to fit that model. In doing that, I come in and I get some fitted function, and I know how to use that function to make predictions. So I go, I make predictions about the value of my house. I go to sell my house, and I make money. And I'm happy, right? I did a good job. Well, maybe, maybe not. Maybe my predictions weren't that good. And so, as a result, the value that I list my house for was inaccurate. And maybe I end up losing money as a result of that. So what we can think about, is a measure of how much are we losing when we make a certain prediction? So for example in the housing application, if we list the house value as too low, then maybe we get low offers. And that's a cost to me relative to having made a better prediction. Or if I list the value as too high, maybe people don't come see the house and I don't get any offers. Or maybe people notice that not many people are showing up to look at the house and they make me a very low offer. So, again, I'm in the situation of being in a worse financial state having made a poor prediction of the value of my house. So a question is, how much am I losing compared to having made perfect predictions? Of course we can never make perfect predictions, the way in which the world works is really complicated, and we can't hope to perfectly model that as well as the noise that's adherent in the process of any observations we might see. But let's just imagine that we could perfectly predict the value, then we'd say, in that case, our loss is 0. We're not losing any money because we did perfectly. So a question is, how do we formalize this notion of how much we're losing? And in machine learning, we do this by defining something called a loss function. And what the loss function specifies is the cost incurred when the true observation is y, and I make some other prediction. So, a bit more explicitly, what we're gonna do, is we're gonna estimate our model parameters. And those are w hat. We're gonna use those to form predictions. So, this notation here, f sub w hat is something we've equivalently written as f hat, but for reasons that we'll see later in this module, this notation is very convenient. And what it is, is it's our predicted value at some input x. And y is the true value. And this loss function, L, is somehow measuring the difference between these two things. And there are a couple ways in which we could define loss function. Well, there's actually many, many ways, but I'm just gonna go through a couple examples. And in particular, these examples that I'm gonna go through assume that the cost you incur by doing an overestimate, relative to an underestimate, are exactly the same. So there's no difference in listing my house as $1,000 too high, relative to $1,000 too low. Okay, so we're assuming what's called a symmetric loss function in these examples. And very common choices include assuming something that's called absolute error, which just looks at the absolute value of the difference between your true value and your predicted value. And another common choice is something called squared error, where, instead of just looking at the absolute value, you look at the square of that difference. And so that means that you have a very high cost if that difference is large, relative to just absolute error. So as we're going through this module, it's useful to keep in the back of your mind this quote by George Box. Which says that, Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful. Okay, so we have spent a lot of time defining different models, and now we're gonna have tools to assess the performance of these methods, to think about these questions of whether they can be useful to us in practice. [MUSIC]