[MUSIC] Now let's see why we get sparsity in our lasso solutions. And, to do this, let's interpret the solution geometrically. But first, to set the stage, let's interpret the ridge regression solution geometrically. And, then we'll get to the lasso. Well, since visualizations are easier in 2D, let's just look at an example where we have two features, h0 and h1. So, let me just write this, two features for visualization sake. And what I'm writing in this green box is my ridge objective simplified just for having two features, and in this pink box, I'm showing just the residual sum of squares term. And what we're going to do to start with, we're gonna make a contour plot for our residual sum of squares in these two dimensions, w0 by w1, and let's look at this residual sum of squares term. Where inside this sum over n observations we're gonna get terms that look like y squared plus w0 squared, h0 squared plus w1 squared, h1 squared plus all the cross terms. When I finish completing this square here, and if I think about this. And if I sum over all my observations, these sums will pass in here. So, if I think of this as a function of w0 and w1, well, what's this defining? If this is equal to some constant, which is what a contour plot is doing it's looking at an objective equal to different values. Well this is an equation of an ellipse. Because I have my two parameters, w0 and w1, each squared. They're multiplied by some weighting and then there's some other terms having to do with w0 and w1, no power greater than squared, that are coming in here, setting it equal to constant. That by definition is an ellipse. Okay, so what I see is that, for my residual sum of square's contour plot, I'm gonna get a series of ellipses. So I'm highlighting here one ellipse. And what this ellipse is is this is residual sum of squares of w0 w1 equal to what I'll call constant 1. This next plot is residual sum of squares w0 w1 = sum constant two, which is greater than constant one and so on. That's what all these curves are, increasing residual sum of squares. And when I walk around this curve, it's a level set, it's a set of all things being equal value. So, if I look at sum w0 w1 pair and I look at some other point, w0 prime, w1 prime, while both of these points, here and here, have the same residual sum of squares which is what I called constant one before. So as I'm walking around this circle, every solution w0 w1 has the exactly the same residule sum of squares. Okay so hopefully what this plot is showing is now clear. And now let's talk about, what if I just minimize residual sum of squares? Well if I just minimize residual sum of squares, everytime I jump from one of these curves to the next curve to the next curve, all the way into the smallest curve, just this dot in the middle, this is going to smaller and smaller residual sum of squares. So this x here marks the minimum over all possible w0 w1 of residual sum of squares w0 w1 and what is that? That's our lee square solution so this is w hat lee squares. Okay, I'm gonna, because this is so important, I'm gonna highlight it in red. That's what this point is. I don't want to draw a circle cuz the circle is exactly what all these other ellipses look like here. So I'm gonna put a little box around this, and I'll highlight in red this is w hat lee squares. Okay, so that would be my solution if I were just minimizing residual sum of squares, but when I'm looking at my ridge objective, there's also this L2 penalty. This sum of w0 squared + w1 squared. And what does that look like? So I have w0 squared + w1 squared and when I'm looking at my contour plot I'm looking at setting this equal to some constant. And changing the value of that constant to get these different contours, the different colors that I'm seeing here. Well, what shape is w0 squared plus w1 squared equal to constant? That is exactly the equation of a circle. So we see that the circle is centered about zero and so each one of these curves here, just to be clear, is my two norm of w squared, equal to, let's say constant one. This next circle is the two norm of w squared equal to sum constant two, which is greater than constant one, and so on. And again, if I look at any point w0 w1 and some other point w0 prime w1 prime, these things have the same norm of the w vector squared. And that's true for all points around the circle. So let's say I'm just trying to minimize my two norm. What's the solution to that? Well I'm going to jump down these contours to my minimum. And my minimum, let me do it directly in red this time. My minimum, oops, that didn't switch colors. Directly in red. My minimum is setting w0 and w1 equal to zero. So this is min/w0 w1, my two norm, which I'll write explicitly in this 2D case, is w0 squared + w1 squared, and the solution is 0. Okay, so this would be our ridge regression solution if lambda, the waiting on this 2 norm, were infinity. We talked about that before. [MUSIC]