So here we are back at our polynomial regression demo. And remember when we're just doing these squares estimation. Let's just quickly scroll through this. Remember we had this data generated from a sine function. And when we to fit a degree-2 polynomial, things looked pretty reasonable. Degree-4 started looking a bit wigglier, larger estimated coefficients and at degree-16 looked really wiggly and had these massive, massive coefficients, okay. And now let's get to our ridge regression, where we're just gonna take our polynomial regression function and modify it. And in using Graph Lab Create it's really simple to do the ridge regression modification because, as we mentioned before, there's this l2 penalty input. To .linear_regression. And before, when we're doing just lee squares we set that L2 penalty equal to zero. And this is this lambda value that we're talking about in trading off between fit and model complexity. So here though, we're gonna actually specify a value for this penalty. And that's the only modification that we have to make in order to implement ridge regression using Graph Lab Create. But again in the assignments for this course you're gonna explore implementing these methods yourself. Okay so let's go and define this polynomial ridge regression function. And then we're just gonna go through and explore performing a fit of this really high order polynomial, this 16th order polynomial that had very wiggly fit, crazy coefficients associated with it. But now, looking at solving the ridge regression objective for different values of lambda. So to start with, let's consider a really, really small lambda value. So a very small penalty on the two norm of the coefficients. And what we'd expect is that the estimated fit would look very similar to the standard lee squares case. And if we look at the plot, this figure looks very very similar, if I scroll up quickly, to the fit we had doing just standard lee squares. So that checks out to what we know should happen, and, likewise, the coefficients are still these really really massive numbers. Okay, but what if we increase the strength of our penalty. So let's consider a very large L2 penalty. Here we're considering a value of 100, whereas in the case above we were considering a value of one-eighth to the -25, so really really tiny. Well in this case, we end up with much smaller coefficients. Actually they look really really small. So let's look at what the fit looks like. And we see a really, really smooth curve. And very flat, actually probably way too simple of a description for what's really going on in the data. It doesn't seem to capture this trend of the data. The value's increasing and then decreasing. It just gets a constant fit followed by a decrease. So, this seems to be under-fit and so as we expect, what we have is that when lambda is really really small we get something similar to our lee square solution and when lambda becomes really really large we start approaching all the coefficients going to 0. Okay so now what we're gonna do is look at the fit as a function of a series of different lambda values going from our 1e to the minus 25 all the way up to the value of 100. But looking at some other intermediate values as well to look at what the fit and coefficients look like as we increase lambda. So we're starting with these crazy, crazy large values. By the time we're at 1e to the -10 for lambda, the values have decreased by two orders of magnitude so have times 10 to the 4th now. Then we keep increasing lambda. 1e to the -6. And we get values on the order of hundreds for our coefficients, so in terms of reasonability of these values I'd say that they start looking a little bit more realistic. And then we keep going and you see that the value of the coefficients keep decreasing, and when we get to this value of lambda that's 100 we get these really small coefficients. But now lets look at what the fits are for these different lambda values. And here's the plot that we've been showing before for this really small lambda. Increasing the lambda a bit smoother fit, still pretty wiggly and crazy, especially on these boundary points. Increase lambda more, things start looking better. When we get to 1e to the -3, this looks pretty good. Especially here, it's hard to tell whether the function should be going up or down. I want to emphasize that app boundaries where you have few observations, it's very hard to control the fit so we believe much more the fit in intermediate regions of our x range where we have observations. Okay but then we get to this really large lambda and we see that clearly we're over smoothing across the data. So a natural question is, out of all these possible lambda values we might consider, and all the associated fits, which is the one that we should use for forming our predictions? Well, it would be really nice if there were some automatic procedure for selecting this lambda value instead of me having to go through, specify a large set of lambdas, look at the coefficients, look at the fit, and somehow make some judgment call about which one I want to use. Well, the good news is that there is a way to automatically choose lambda. And this is something we're gonna discuss later in this module. So one method that we're gonna talk about is something called leave one out cross validation. And what leave one out cross validation does is it approximates, so minimizing this leave one out cross-validation error that we're gonna talk about, approximates minimizing the average mean squared error in our predictions. So, what we're gonna do here is we're gonna define this leave one out cross-validation function and then apply it to our data. And, this leave one out cross validation function, you're not gonna understand what's going on here yet. But you will by the end of this module. You'll be able to implement this method yourself. But what it's doing is it's looking at prediction error of different lambda values and then choosing one to minimize. But of course we're not looking at that on the training error or on the, sorry on the training set or the test set, we're using a validation set but in a very specific way. Okay, so now that we've applied this leave one out function to our data in some set of specified penalty values, we can look at what the plot of this leave one out cross validation error looks like as a function of our considered lambda values. And in this case, we actually see a curve that's pretty flat. In a bunch of regions. And what this means is that our fits are not so sensitive to those choice of lambda in these regions. But there is some minimum and we can figure out what that minimum is here. So here we're just selecting the lambda that has the lowest cross validation error. And then we're gonna fit our polynomial ridge regression model using that specific lambda value. And we're printing our coefficients and what you see is we have very reasonable numbers. Things on the order of 1, .2, .5, and let's look at the associated fit. And things look really nice in this case. So, there is a really nice trend throughout most of the range of x. The only place that things look a little bit crazy is out here in the boundary. But again, at this boundary region we actually don't have any data to really pin down this function. So, considering it's a 16 order polynomial, we're shrinking coefficients but we don't really have much information about what the function should do out here. But what we've seen is that this leave one out cross validation technique really nicely selects a lambda value that provides a good fit and automatically does this balance of bias and variance for us. [MUSIC]