[MUSIC] So, the first measure of error of our predictions that we can look at is something called training error. And we discussed this at a high level in the first course of the specialization, but now let's go through it in a little bit more detail. So, to define training error, we first have to define training data. So, training data typically you have some dataset which I've shown you are these blue circles here, and we're going to choose our training dataset just some subset of these points. So, the greyed circles are ones that are not included in the training set. The blue circles are the ones that we're keeping in this training set. And then we take our training data and, as we've discussed in previous modules of this course, we use it in order to fit our model, to estimate our model parameters. Just as an example, for example with this dataset here, maybe we choose to fit some quadratic function to the data and like we've talked about in order to fit this quadratic function, we're gonna minimize the residual sum of squares on these training data points. So, now we have our estimated model parameters, w hat. And we want to assess the training error of that estimated model. And the way we do that is first we need to define some lost functions. So, maybe we look at squared error, absolute error. Any one fo the many possibilities for our lost function. And then the way training error's defined is simply as the average loss, defined over the training points. So, mathematically what this is is simply 1 over N. So, N are the total number of observations in my training set. Some of the loss over each one of those training observations. And just to remember to be very clear the estimated parameters were estimated on the training set. They were Minimizing the residual semi-squares for these training points that we're looking at again and defining this training error. So, we can go through this pictorially in the following example, where in this case we're specifically looking at using squared error as our loss function. And in this case, our training error is simply one over n times the sum of The difference between our actual house sales price and our predicted house sales price squared. Where that sum is taken over all houses in our training data set. And what we see is that in this case where we choose squared error as our loss function, then the form of training error Is exactly 1 over N times our residual sum of squares. And I want to note here that there's some difference in convention that people use, whether there's the 1 over N as the definition of training error, or not. So, just be aware of that when you're computing training error and reporting these numbers. Here we're defining it as the average loss. More formally we can write our training error as follows and then we can define something that's commonly referred to just as something as RMSE and the full name is root mean square error. And RMSE is simply the square root of our average loss on the training houses. So, the square root of our training error. And the reason one might consider looking at root mean square error is because the units, in this case, are just dollars. Whereas when we thought about our training error, the units were dollars squared. Remember we're taking the squares of all these differences in dollars. So, the result is dollars squared. So, that's a little bit less intuitive as an error metric than just an error in terms of dollars themselves. Now, that we've defined training error, we can look at how training error behaves as model complexity increases. So, to start with let's look at the simplest possible model you might fit, which is just a constant model. So this is the simplest model we're gonna consider, or could consider, and you see that there is pretty significant training error. So let's just say that that has some value here, this is the training error of the constant model. Then let's say I fit a linear model. Well, a line, these are all linear models we're looking at, it's linear regression. But just fitting a line to the data. And you see that my training error has gone down. So, some other value that I'm showing with this pink circle here. Then I fit a quadratic function again training error goes down, and what I see is that as I increase my model complexity to maybe this higher order of polynomial, I have very low training error just this one pink bar here. So, training error decreases quite significantly with model complexity and, in total not that we've gone through these examples we can look at what the plot of training error versus model complexity tends to look like. So, there's a decrease in training error as you increase your model complexity. And why is that? Well, it's pretty intuitive, because the model was fit on the training points and then I'm saying how well does it fit it? As I increase the model complexity, I'm better and better able to fit my training data points. So, then when I go to assess my training error with these high-complexity models, I have very low training error. So, a natural question is whether a training error is a good measure of predictive performance? And what we're showing here is one of our high-complexity, high-order polynomial models that had very low training error. So it really fit those training data points well. But how's it gonna perform on some new house? So, in particular, maybe we're looking at a house in this gray region, so with this range of square feet. Question is, is there something particularly wrong with having Xt square feet? Because what our fitted function is saying is that I believe or I'm predicting that the values of houses with roughly Xt square feet are less valuable than houses with fewer square feet, cuz there's this dip down in this function. Do we really believe that this is a true dip in value, that these houses are just less desirable than houses with fewer or more square feet? Probably not. So, what's going wrong here? The issue is the fact that training error is overly optimistic when we're going to assess predictive performance. And that's because these parameters, w-hat, were fit on the training data. They were fit to minimize this training error. Sorry, minimize residual sum of squares, which can often be related to training error. And then we're using training error to assess predictive performance but that's gonna be very very optimistic as this picture shows. So, in general, having small training error does not imply having good predictive performance unless your training data set is really representative of everything that you might see there out in the world. [MUSIC]