[MUSIC] At the beginning of this module, we talked about this idea of fitting globally versus fitting locally. Now that we've seen k nearest neighbors and kernel regression, I wanna formalize this idea. So in particular, let's look at what happens when we just fit a constant function to our data. So in that case that's just computing what's called a global average where we take all of our observations, add them together and take the average or just divide by that total number of observations. So that's exactly equivalent to summing over a weighted set of our observations, where the weights are exactly the same on each of our data points, and then dividing by the total sum of these weights. So now that we've put our global average in this form, things start to look very similar to the kernel regression ideas that we've looked at. Where here it's almost like kernel regression, but we're including every observation in our fit, and we're having exactly the same weights on every observation. So that's like using this box car kernel that puts the same weights on all observations, and just having a really really massively large bandwidth parameters such that for every point in our input space all the other observations are gonna be included in the fit. But now let's contrast that with a more standard version of kernel regression, which leads to what we're gonna think of as locally constant fits. Because [COUGH] if we look at the kernel regression equation, what we see is that, it's exactly what we had for our global average, but now it's gonna be weighted by this kernel. Where in a lot of cases, what that kernel is doing, is it's putting a hard limit that some observations outside of our window of around whatever target point what we're looking at, are out of our calculation. So the simplest case we can talk about is this box car kernel, that's gonna put equal weights over all observations, but just local to our target point x,o. And so, we're gonna get a constant fit but, just at that one target point, and then we're going to get a different constant fit at the next target point, and the next one, and the next one. And, I want to be clear that the resulting output isn't a stair case kind of function. It's not a collection of these constant fits. It is a collection of the constant fits, but just at a single point. So we're taking a single point, doing another constant fit, taking the single point, which is at that target, and as we're doing this over all our different inputs that's what's defining this green curve. Okay, but if we look at another kernel, like our Epanechnikov kernel that has the weights decaying over this fixed region. Well, it is still doing a constant fit, but how is it figuring out what the level of that line should be at our target point? Well, what it's doing is, it's just down weighting observations that are further from our target point and emphasizing more heavily the observations more close to our target point. So this is just a weighted global average but its no longer global it's local because we're only looking at observations within this defined window. So we're doing this weighted average locally at each one of our input points and tracing out this green curve. So, this hopefully makes very clear how before in the types of linear regression models we were talking about, we were doing these global fits which in the simplest case, was just a constant model. That was our most basic model we could consider having just the constant feature and now what we're talking about is doing exactly the same thing but locally and so locally that it's at every single point at our input space. So this kernel regression method that we've described so far, we've now motivated as fitting a constant function locally at each observation, well more than each observation, each point in our input space. And this is referred to as locally weighted averages but instead of fitting a constant at each point in our input space we could have likewise fit a line or polynomial. And so what this leads to is something that's called locally weighted linear regression. We are not going to go through the details of of locally weighted linear regression in this module. It's fairly straightforward. It's a similar idea to these local constant fits, but now plugging in a line or polynomial. But I wanted to leave you with a couple rules of thumb for which fit you might choose between a different set of polynomials that you have options over. And one thing that fitting a local line instead of a local constant helps you with are those boundary effects that we talked about before. The fact that you get these large biases at the boundary. So you can show very formally that these local linear fits help with that bias, and if we talk about local quadratic fits, that helps with bias that you get at points of curvature in the interior view of space. So, for example, we see that blue curve we've been trying to fit, and if we go back, maybe it's worth quickly jumping back to what our fit looks like we see that, towards the boundary we get large biases, and right at the point of curvature, we also have a bias where we're under fitting the true curvature of that blue function. And so the local quadratic fit helps with fitting that curvature. But what it does is it actually leads to a larger variance so that can be unattractive. So in general just a basic recommendation is to use just a standard local linear regression, fitting lines at every point in the input space. [MUSIC]