But now let's go through exactly the same geometric interpretation for our lasso objective. And for our lasso objective, we have our residual sum of squares plus lambda times rl1norm. Okay so when we look at this first term which is in this pink or a rather fuchsia box. We have exactly the same Residual Sum of Squares that we talked about for ridge so when we visualize the contours associated with Residual Sum of Squares in our lasso objective it's exactly the same so Residual Sum of Squares Contours For lasso are exactly the same as those for. Ridge. Okay, so I don't need to explain this plot again. You remember what it is from our ridge visualization we just went through. But now let's look at the term that's different. There's looking at an L1 penalty instead of an L2 penalty and here if we think of looking at the absolute value of W0 plus the absolute value of W1, equal to some constant, defining one of our level sets in our contour plot, well what does that look like? That defines a diamond. Okay. So, as I'm walking along this diamond, every point along this surface, here, this line that I'm drawing, has exactly the same L1 norm. So, here I have my one norm. Is equal to some constant 1. Here I have the one norm equal to some constant 2 greater than constant 1 and so on and again just to be very explicit if I look at some W0 W 1 pair can look at any two points. And sum W 0 prime, W 1 prime. These points, or any points along this surface have the same. Two, sorry, one norm. That'd be one. Okay, so if I'm just trying to minimize my l1 norm, what's the solution? Well, again, just like in ridge, the solution is to make the magnitude as small as possible Which is zero so this is min over w of my one norm. Okay. So, this is a really important visualization for the one norm and we're going to return to it in a couple slides. But first what I wanna do is show exactly the same type of movie that we showed for ridge objective but now for lasso so again this is a movie where we're adding the two contour plots so adding RSS + lambda W1 in this case so we're adding ellipses. Plus some waiting Lambda of a set of diamonds. And then we're gonna solve for the minimum, that's gonna be x. So, x is again our optimal W hat for a specific lambda. And we're gonna look at how that solution changes as we increase the value of lambda. And again if we set lambda equal to zero we're gonna be at our least square solution, so we're gonna start at exactly the same point that we did in our ridge regression movie. But now as we're increasing lambda, the solution's gonna look very different than what it did for ridge regression. We know that the final solution is gonna go towards zero, but let's look at what the path looks like. Okay, Vanna You're up, play the movie. So, what we see is that the solution eventually gets the point where w0 is exactly equal to 0. So, if we watch this movie again, we see that this X is moving along shrinking and then it hits the Y axis and it moves along that Y axis. So, the first thing that happens is W 0 becomes exactly 0 while the coefficients shrink and at some point it hits the point where W 0 becomes exactly 0 and then our W 1 term, the waiting on this second feature, H 1, is going to decrease and decrease and decrease as we continue to increase out penalty term lambda. So, it's going to continue to walk down this axis. So, lets watch this one more time with this in mind. Our solution hits that zero point, that spar solution where W0 hat is equal to zero and then it continues to shrink the coefficients to zero. And you see that our contours become more and more like the diamonds that are defined by that L1 norm. As the weighting on that norm increases. Now,let's go ahead and visualize what the lasso solution looks like. And this is where we're gonna get our geometric intuition beyond what was just shown in the movie for why lasso solutions are sparse. So, we already saw in the movie that for certain values of lambda, we're gonna get coefficients exactly equal to zero. But now let's just look at just one value of lambda. And here Is our solution, and what you see is that because of this diamond, so let me write this as our solution, Because of this diamond shape of our L1 objective or the penalty that we're adding We're gonna have some probability of hitting those corners of this diamond. And at those corners we're gonna get sparse solutions. So, like Carlos likes to say, it's like a ninja star that's stabbing our RSS contours. So, maybe that's a little Brutal of a description but maybe you'll remember it. So, this is why lasso leads to sparse solutions. And another thing I want to mention is this visualization is just in two dimensions, but as we get to higher dimensions instead of diamonds they're called Wrong boy and that they're very pointy objects. So, in high dimensions were very likely to hit one of those corners of this L1 penalty for any value of Lynda. [MUSIC]