But now let's go through exactly the same geometric
interpretation for our lasso objective. And for our lasso objective, we have our residual sum of
squares plus lambda times rl1norm. Okay so when we look at this first
term which is in this pink or a rather fuchsia box. We have exactly the same Residual Sum
of Squares that we talked about for ridge so when we visualize
the contours associated with Residual Sum of Squares in our lasso
objective it's exactly the same so Residual Sum of Squares Contours For lasso are exactly the same as those for. Ridge. Okay, so
I don't need to explain this plot again. You remember what it is from our ridge
visualization we just went through. But now let's look at
the term that's different. There's looking at an L1 penalty
instead of an L2 penalty and here if we think of looking
at the absolute value of W0 plus the absolute value of W1,
equal to some constant, defining one of our level sets in our contour plot,
well what does that look like? That defines a diamond. Okay.
So, as I'm walking along this diamond, every point along this surface, here, this line that I'm drawing,
has exactly the same L1 norm. So, here I have my one norm. Is equal to some constant 1. Here I have the one norm
equal to some constant 2 greater than constant 1 and so on and again just to be very
explicit if I look at some W0 W 1 pair can look at any two points. And sum W 0 prime, W 1 prime. These points, or any points along
this surface have the same. Two, sorry, one norm. That'd be one. Okay, so if I'm just trying to minimize
my l1 norm, what's the solution? Well, again, just like in ridge, the solution is to make the magnitude
as small as possible Which is zero so this is min over w of my one norm. Okay. So, this is a really
important visualization for the one norm and we're going to
return to it in a couple slides. But first what I wanna do is show exactly
the same type of movie that we showed for ridge objective but now for lasso so again this is a movie where we're
adding the two contour plots so adding RSS + lambda W1 in this case so
we're adding ellipses. Plus some waiting Lambda
of a set of diamonds. And then we're gonna solve for
the minimum, that's gonna be x. So, x is again our optimal W hat for a specific lambda. And we're gonna look at how that solution changes
as we increase the value of lambda. And again if we set lambda equal to zero we're gonna be at our least square
solution, so we're gonna start at exactly the same point that we did in
our ridge regression movie. But now as we're increasing lambda, the solution's gonna look very different
than what it did for ridge regression. We know that the final solution
is gonna go towards zero, but let's look at what the path looks like. Okay, Vanna You're up, play the movie. So, what we see is that the solution eventually gets the point where
w0 is exactly equal to 0. So, if we watch this movie again, we see that this X is
moving along shrinking and then it hits the Y axis and
it moves along that Y axis. So, the first thing that happens is W 0
becomes exactly 0 while the coefficients shrink and at some point it hits the point
where W 0 becomes exactly 0 and then our W 1 term, the waiting on this second
feature, H 1, is going to decrease and decrease and decrease as we continue
to increase out penalty term lambda. So, it's going to continue
to walk down this axis. So, lets watch this one more
time with this in mind. Our solution hits that zero point, that spar solution where
W0 hat is equal to zero and then it continues to shrink
the coefficients to zero. And you see that our
contours become more and more like the diamonds that
are defined by that L1 norm. As the weighting on that norm increases. Now,let's go ahead and visualize
what the lasso solution looks like. And this is where we're gonna
get our geometric intuition beyond what was just shown in the movie
for why lasso solutions are sparse. So, we already saw in the movie that for certain values of lambda, we're gonna
get coefficients exactly equal to zero. But now let's just look at
just one value of lambda. And here Is our solution, and
what you see is that because of this diamond, so
let me write this as our solution, Because of this diamond
shape of our L1 objective or the penalty that we're
adding We're gonna have some probability of hitting those
corners of this diamond. And at those corners we're
gonna get sparse solutions. So, like Carlos likes to say,
it's like a ninja star that's stabbing our RSS contours. So, maybe that's a little Brutal of a
description but maybe you'll remember it. So, this is why lasso
leads to sparse solutions. And another thing I want to
mention is this visualization is just in two dimensions, but
as we get to higher dimensions instead of diamonds they're called Wrong boy and
that they're very pointy objects. So, in high dimensions were
very likely to hit one of those corners of this L1 penalty for
any value of Lynda. [MUSIC]