1 00:00:01,500 --> 00:00:05,540 But now let's go 2 00:00:05,540 --> 00:00:09,870 through exactly the same geometric interpretation for our lasso objective. 3 00:00:11,340 --> 00:00:13,090 And for our lasso objective, 4 00:00:13,090 --> 00:00:18,130 we have our residual sum of squares plus lambda times rl1norm. 5 00:00:19,205 --> 00:00:24,460 Okay so when we look at this first term which is in this pink or 6 00:00:24,460 --> 00:00:26,900 a rather fuchsia box. 7 00:00:26,900 --> 00:00:30,490 We have exactly the same Residual Sum of Squares that we talked about for 8 00:00:30,490 --> 00:00:35,000 ridge so when we visualize the contours associated with 9 00:00:35,000 --> 00:00:40,090 Residual Sum of Squares in our lasso objective it's exactly the same so 10 00:00:40,090 --> 00:00:46,166 Residual Sum of Squares Contours For 11 00:00:46,166 --> 00:00:53,284 lasso are exactly 12 00:00:53,284 --> 00:01:01,352 the same as those for. 13 00:01:02,780 --> 00:01:03,280 Ridge. 14 00:01:05,110 --> 00:01:07,250 Okay, so I don't need to explain this plot again. 15 00:01:07,250 --> 00:01:12,100 You remember what it is from our ridge visualization we just went through. 16 00:01:12,100 --> 00:01:14,540 But now let's look at the term that's different. 17 00:01:14,540 --> 00:01:19,180 There's looking at an L1 penalty instead of an L2 penalty and 18 00:01:19,180 --> 00:01:24,520 here if we think of looking at the absolute value 19 00:01:24,520 --> 00:01:31,430 of W0 plus the absolute value of W1, equal to some constant, defining 20 00:01:31,430 --> 00:01:37,510 one of our level sets in our contour plot, well what does that look like? 21 00:01:37,510 --> 00:01:38,576 That defines a diamond. 22 00:01:42,013 --> 00:01:46,348 Okay. So, as I'm walking along this diamond, 23 00:01:46,348 --> 00:01:52,260 every point along this surface, 24 00:01:52,260 --> 00:01:57,610 here, this line that I'm drawing, has exactly the same L1 norm. 25 00:01:57,610 --> 00:02:02,120 So, here I have my one norm. 26 00:02:02,120 --> 00:02:05,540 Is equal to some constant 1. 27 00:02:05,540 --> 00:02:10,540 Here I have the one norm equal to some constant 28 00:02:10,540 --> 00:02:15,470 2 greater than constant 1 and so 29 00:02:15,470 --> 00:02:20,260 on and again just to be very explicit if I look at some 30 00:02:20,260 --> 00:02:26,010 W0 W 31 00:02:26,010 --> 00:02:30,620 1 pair can look at any two points. 32 00:02:30,620 --> 00:02:34,010 And sum W 0 prime, W 1 prime. 33 00:02:34,010 --> 00:02:40,750 These points, or any points along this surface have the same. 34 00:02:40,750 --> 00:02:42,420 Two, sorry, one norm. 35 00:02:43,460 --> 00:02:43,960 That'd be one. 36 00:02:45,810 --> 00:02:53,890 Okay, so if I'm just trying to minimize my l1 norm, what's the solution? 37 00:02:55,110 --> 00:02:57,070 Well, again, just like in ridge, 38 00:02:57,070 --> 00:03:01,630 the solution is to make the magnitude as small as possible Which is zero so 39 00:03:01,630 --> 00:03:06,737 this is min over w of my one norm. 40 00:03:06,737 --> 00:03:09,259 Okay. 41 00:03:09,259 --> 00:03:16,260 So, this is a really important visualization for 42 00:03:16,260 --> 00:03:20,740 the one norm and we're going to return to it in a couple slides. 43 00:03:20,740 --> 00:03:26,039 But first what I wanna do is show exactly the same type of movie that we showed for 44 00:03:26,039 --> 00:03:28,450 ridge objective but now for lasso so 45 00:03:28,450 --> 00:03:33,293 again this is a movie where we're adding the two contour plots so adding 46 00:03:37,192 --> 00:03:43,979 RSS + lambda W1 in this case so we're adding ellipses. 47 00:03:46,077 --> 00:03:49,281 Plus some waiting Lambda of a set of diamonds. 48 00:03:51,764 --> 00:03:55,940 And then we're gonna solve for the minimum, that's gonna be x. 49 00:03:55,940 --> 00:04:02,030 So, x is again our optimal W hat for 50 00:04:02,030 --> 00:04:05,600 a specific lambda. 51 00:04:05,600 --> 00:04:10,410 And we're gonna 52 00:04:10,410 --> 00:04:15,340 look at how that solution changes as we increase the value of lambda. 53 00:04:15,340 --> 00:04:18,970 And again if we set lambda equal to zero 54 00:04:18,970 --> 00:04:21,930 we're gonna be at our least square solution, so we're gonna start at exactly 55 00:04:21,930 --> 00:04:25,400 the same point that we did in our ridge regression movie. 56 00:04:25,400 --> 00:04:27,300 But now as we're increasing lambda, 57 00:04:27,300 --> 00:04:31,010 the solution's gonna look very different than what it did for ridge regression. 58 00:04:31,010 --> 00:04:34,870 We know that the final solution is gonna go towards zero, but 59 00:04:34,870 --> 00:04:36,500 let's look at what the path looks like. 60 00:04:37,800 --> 00:04:42,720 Okay, Vanna You're up, play the movie. 61 00:04:46,220 --> 00:04:49,590 So, what we see is that the solution 62 00:04:49,590 --> 00:04:54,090 eventually gets the point where w0 is exactly equal to 0. 63 00:04:54,090 --> 00:04:59,910 So, if we watch this movie again, 64 00:04:59,910 --> 00:05:02,370 we see that this X is moving along shrinking and 65 00:05:02,370 --> 00:05:07,410 then it hits the Y axis and it moves along that Y axis. 66 00:05:07,410 --> 00:05:13,170 So, the first thing that happens is W 0 becomes exactly 0 while the coefficients 67 00:05:13,170 --> 00:05:19,120 shrink and at some point it hits the point where W 0 becomes exactly 0 and then 68 00:05:20,630 --> 00:05:25,860 our W 1 term, the waiting on this second feature, H 1, is going to decrease and 69 00:05:25,860 --> 00:05:30,040 decrease and decrease as we continue to increase out penalty term lambda. 70 00:05:30,040 --> 00:05:34,290 So, it's going to continue to walk down this axis. 71 00:05:34,290 --> 00:05:37,576 So, lets watch this one more time with this in mind. 72 00:05:37,576 --> 00:05:43,420 Our solution hits that zero point, 73 00:05:43,420 --> 00:05:49,040 that spar solution where W0 hat is equal to zero and 74 00:05:49,040 --> 00:05:52,120 then it continues to shrink the coefficients to zero. 75 00:05:52,120 --> 00:05:54,100 And you see that our contours become more and 76 00:05:54,100 --> 00:05:57,640 more like the diamonds that are defined by that L1 norm. 77 00:05:57,640 --> 00:06:00,290 As the weighting on that norm increases. 78 00:06:01,455 --> 00:06:04,600 Now,let's go ahead and visualize what the lasso solution looks like. 79 00:06:05,600 --> 00:06:08,510 And this is where we're gonna get our geometric intuition 80 00:06:08,510 --> 00:06:12,530 beyond what was just shown in the movie for why lasso solutions are sparse. 81 00:06:12,530 --> 00:06:14,810 So, we already saw in the movie that for 82 00:06:14,810 --> 00:06:18,990 certain values of lambda, we're gonna get coefficients exactly equal to zero. 83 00:06:18,990 --> 00:06:21,059 But now let's just look at just one value of lambda. 84 00:06:32,060 --> 00:06:36,910 And here Is our solution, and what you see is that because 85 00:06:36,910 --> 00:06:41,575 of this diamond, so let me write this as our solution, 86 00:06:44,240 --> 00:06:48,992 Because of this diamond shape of our L1 objective or 87 00:06:48,992 --> 00:06:53,852 the penalty that we're adding We're gonna have some 88 00:06:53,852 --> 00:06:59,590 probability of hitting those corners of this diamond. 89 00:06:59,590 --> 00:07:03,210 And at those corners we're gonna get sparse solutions. 90 00:07:03,210 --> 00:07:07,020 So, like Carlos likes to say, it's like a ninja star 91 00:07:07,020 --> 00:07:11,790 that's stabbing our RSS contours. 92 00:07:11,790 --> 00:07:20,290 So, maybe that's a little Brutal of a description but maybe you'll remember it. 93 00:07:20,290 --> 00:07:24,890 So, this is why lasso leads to sparse solutions. 94 00:07:27,050 --> 00:07:29,650 And another thing I want to mention is this visualization 95 00:07:29,650 --> 00:07:33,670 is just in two dimensions, but as we get to higher dimensions instead of 96 00:07:33,670 --> 00:07:37,530 diamonds they're called Wrong boy and that they're very pointy objects. 97 00:07:37,530 --> 00:07:41,914 So, in high dimensions were very likely to hit one of those 98 00:07:41,914 --> 00:07:45,768 corners of this L1 penalty for any value of Lynda. 99 00:07:45,768 --> 00:07:51,059 [MUSIC]