[MUSIC] So now let's step back and discuss some important theoretical and practical aspects of K-nearest neighbors and kernel regression. If you remember the title of this module it was Going Nonparametric, and we've yet to mention what that means. What is a nonparametric approach? Well, K-nearest neighbors and kernel regression are examples of nonparametric approaches. And the general goal of a non-parametric approach is to be really flexible in how you're defining f of x and in general you want to make as few assumptions as possible. And the really key that defines a non-parametric method Is that the complexity of the fit can grow as you get more data points. We've definitely seen that with K-nearest neighbors and kernel regression, in particular the fit is a function of how many observations you have. But these are just two examples of nonparametric methods you might use for regression. There are lots of other choices. Things like splines, and trees which we'll talk about in the classification course, and locally weighted structured versions of the types of regression models we've talked about. So nonparametrics is all about this idea of having the complexity grow with the number of observations. So now let's talk about what's the limiting behavior of nearest neighbor regression as you get more and more data. And to start with, let's just assume that we get completely noiseless data. So every observation we get lies exactly on the true function. Well in this case, the mean squared error of one nearest neighbor regression goes to zero, as you get more and more data. But let's just remember what mean squared error is and if you remember from a couple modules ago, we talked about this bias-variance trade-off and that mean squared error is the sum of bias squared plus variance. So having mean squared error go to zero means that both bias and variance are going to zero. So to motivate this visually, let's just look at a couple of movies. Here, in this movie, I'm showing what the one nearest neighbor fit looks like as we're getting more and more data. So, remember the blue line is our true curve. The green line is our current nearest neighbor fit based on some set of observations that are gonna lie exactly on the true function at that blue curve. Okay, so here's our fit changing as we get more and more data and what you see is that it's getting closer, and closer, and closer, and closer, and closer to the true function. And hopefully you can believe that in limit of getting an infinite number of observations spread over our input space this nearest neighbor fit is gonna lie exactly on top of the true function. And that's true of all possible data sets of infinite number of observations that we would get. In contrast, if we look what just doing a standard quadratic fit, just our standard least squares fit we talked about before. No matter how much data we get, there's always gonna be some bias. So we can see this here, where especially at this point of curvature we see that this green fit, even as we get lots and lots of observations, is never matching up to that true blue curve. And that's because that true blue curve, that's actually a part of a sinusoid. We've just zoomed in on a certain region of that sinusoid. And so this quadratic fit is never exactly gonna describe what a sinusoid is describing. So this is what we talked about before, about the bias that's inherent in our model, even if we have no noise. So even if we eliminate this noise, we still have the fact that our true error, as we're getting more and more and more data, is never gonna to go exactly to zero. Unless of course the data were generated from exactly the model that we're using to fit the data. But in most cases, for example, maybe you have data that looks like the following. So this is our noise list data, and if we can strain our this was all for fixed model complexity remember, if we can strain our model to say be a quadratic, then maybe this will be our best quadratic fit. And no matter how many observations I give you from this more complicated function, this quadratic is never gonna have zero bias. In contrast, let's switch colors here, so that we can draw our one nearest neighbor fit. Our one nearest neighbor, as we get more and more data, it's fitting these constants locally to each observation. And as we get more and more and more data, the fit is gonna look exactly like the true curve. And so when we talk about our true error with increasing number of observations. Our true error for, this is a plot of true error for one nearest neighbor, is going to go to zero for noiseless data. But now let's talk about the noisy case. This is the case that we're typically faced with in most applications. And in this case what you can say is that the mean squared error of nearest neighbor regression goes to zero. If you allow the number of neighbors or the k in our nearest neighbor regression, to increase with the number of observations as well. Because if you think about getting tons and tons and tons of observations. If you keep k fixed, you're just gonna be looking at a very, very, very local region of your input space. And you're gonna have A lot of noise introduced from that, but if you'll allow k to grow, it's gonna smooth over the noise that's being introduced. So let's look at a visualization of this. So here what we're showing is the same true function we've shown throughout this module, but we're showing tons of observations, all these great dots. But they're noisy observations, they're no longer lying exactly on that blue curve. That's why they're this cloud of blue points. And we see that our one nearest neighbor fit is very, very noisy. Okay, it has this wild behavior, because like we discussed before, one nearest neighbor is very sensitive to noise in the data. But in contrast, if we look at a large k, so here we're looking at k equals 200. So our 200 nearest neighbors fit, it looks much, much better. So you can imagine that as you get more and more observations, if you're allowing k to grow, you can smooth over the noise being introduced by each one of these observations, and have the mean squared error going to zero. But in contrast, again, if we look at just our standard least squares regression here in this case of a quadratic fit, we're always gonna have bias. So nothing's different by having introduced noise. It, if anything, will just make things worse. [MUSIC]