1 00:00:00,000 --> 00:00:04,700 [MUSIC] 2 00:00:04,700 --> 00:00:08,180 Now let's see why we get sparsity in our lasso solutions. 3 00:00:08,180 --> 00:00:11,360 And, to do this, let's interpret the solution geometrically. 4 00:00:11,360 --> 00:00:12,970 But first, to set the stage, 5 00:00:12,970 --> 00:00:16,250 let's interpret the ridge regression solution geometrically. 6 00:00:16,250 --> 00:00:17,984 And, then we'll get to the lasso. 7 00:00:17,984 --> 00:00:21,757 Well, since visualizations are easier in 2D, 8 00:00:21,757 --> 00:00:27,284 let's just look at an example where we have two features, h0 and h1. 9 00:00:27,284 --> 00:00:33,249 So, let me just write this, 10 00:00:33,249 --> 00:00:36,730 two features for 11 00:00:36,730 --> 00:00:41,470 visualization sake. 12 00:00:41,470 --> 00:00:44,300 And what I'm writing in this green box is my 13 00:00:45,380 --> 00:00:50,280 ridge objective simplified just for having two features, and 14 00:00:50,280 --> 00:00:53,890 in this pink box, I'm showing just the residual sum of squares term. 15 00:00:53,890 --> 00:00:58,790 And what we're going to do to start with, we're gonna make a contour plot for 16 00:00:58,790 --> 00:01:02,720 our residual sum of squares in these two dimensions, 17 00:01:02,720 --> 00:01:07,950 w0 by w1, and let's look at this residual sum of squares term. 18 00:01:07,950 --> 00:01:13,346 Where inside this sum over n observations we're 19 00:01:13,346 --> 00:01:18,213 gonna get terms that look like y squared plus 20 00:01:18,213 --> 00:01:23,213 w0 squared, h0 squared plus w1 squared, 21 00:01:23,213 --> 00:01:27,440 h1 squared plus all the cross terms. 22 00:01:29,350 --> 00:01:36,630 When I finish completing this square here, and if I think about this. 23 00:01:38,300 --> 00:01:44,160 And if I sum over all my observations, these sums will pass in here. 24 00:01:44,160 --> 00:01:51,660 So, if I think of this as a function of w0 and w1, well, what's this defining? 25 00:01:51,660 --> 00:01:56,748 If this is equal to some constant, which is what a contour 26 00:01:56,748 --> 00:02:03,540 plot is doing it's looking at an objective equal to different values. 27 00:02:03,540 --> 00:02:05,780 Well this is an equation of an ellipse. 28 00:02:07,598 --> 00:02:15,200 Because I have my two parameters, w0 and w1, each squared. 29 00:02:15,200 --> 00:02:20,140 They're multiplied by some weighting and then there's some other terms having to do 30 00:02:20,140 --> 00:02:24,900 with w0 and w1, no power greater than squared, that are coming in here, 31 00:02:24,900 --> 00:02:26,330 setting it equal to constant. 32 00:02:26,330 --> 00:02:29,090 That by definition is an ellipse. 33 00:02:29,090 --> 00:02:32,360 Okay, so what I see is that, for 34 00:02:32,360 --> 00:02:38,450 my residual sum of square's contour plot, I'm gonna get a series of ellipses. 35 00:02:38,450 --> 00:02:42,550 So I'm highlighting here one ellipse. 36 00:02:42,550 --> 00:02:49,310 And what this ellipse is is this is residual sum of squares of w0 w1 37 00:02:50,850 --> 00:02:55,280 equal to what I'll call constant 1. 38 00:02:55,280 --> 00:02:59,971 This next plot is residual sum of 39 00:02:59,971 --> 00:03:05,357 squares w0 w1 = sum constant two, 40 00:03:05,357 --> 00:03:12,160 which is greater than constant one and so on. 41 00:03:12,160 --> 00:03:15,830 That's what all these curves are, increasing residual sum of squares. 42 00:03:15,830 --> 00:03:19,010 And when I walk around this curve, it's a level set, 43 00:03:19,010 --> 00:03:23,960 it's a set of all things being equal value. 44 00:03:23,960 --> 00:03:28,630 So, if I look at sum w0 w1 pair and 45 00:03:28,630 --> 00:03:36,090 I look at some other point, w0 prime, w1 prime, 46 00:03:36,090 --> 00:03:42,070 while both of these points, here and here, have the same 47 00:03:43,810 --> 00:03:47,940 residual sum of squares which is what I called constant one before. 48 00:03:47,940 --> 00:03:50,380 So as I'm walking around this circle, 49 00:03:50,380 --> 00:03:55,440 every solution w0 w1 has the exactly the same residule sum of squares. 50 00:03:55,440 --> 00:03:58,300 Okay so hopefully what this plot is showing is now clear. 51 00:03:58,300 --> 00:04:03,070 And now let's talk about, what if I just minimize residual sum of squares? 52 00:04:04,700 --> 00:04:07,390 Well if I just minimize residual sum of squares, 53 00:04:07,390 --> 00:04:11,910 everytime I jump from one of these curves to the next curve to the next curve, 54 00:04:11,910 --> 00:04:16,530 all the way into the smallest curve, just this dot in the middle, 55 00:04:18,050 --> 00:04:20,790 this is going to smaller and smaller residual sum of squares. 56 00:04:20,790 --> 00:04:25,260 So this x here marks the minimum 57 00:04:25,260 --> 00:04:30,860 over all possible w0 w1 of residual sum of squares 58 00:04:30,860 --> 00:04:35,960 w0 w1 and what is that? 59 00:04:35,960 --> 00:04:41,080 That's our lee square solution so this is w hat lee squares. 60 00:04:42,310 --> 00:04:46,770 Okay, I'm gonna, because this is so important, I'm gonna highlight it in red. 61 00:04:46,770 --> 00:04:48,840 That's what this point is. 62 00:04:48,840 --> 00:04:52,173 I don't want to draw a circle cuz the circle is 63 00:04:52,173 --> 00:04:56,230 exactly what all these other ellipses look like here. 64 00:04:56,230 --> 00:04:59,946 So I'm gonna put a little box around this, and 65 00:04:59,946 --> 00:05:03,865 I'll highlight in red this is w hat lee squares. 66 00:05:03,865 --> 00:05:09,152 Okay, so that would be my solution if I were just minimizing 67 00:05:09,152 --> 00:05:13,805 residual sum of squares, but when I'm looking at my 68 00:05:13,805 --> 00:05:18,950 ridge objective, there's also this L2 penalty. 69 00:05:18,950 --> 00:05:22,215 This sum of w0 squared + w1 squared. 70 00:05:22,215 --> 00:05:24,650 And what does that look like? 71 00:05:24,650 --> 00:05:29,558 So I have w0 squared + w1 squared and when I'm looking at my 72 00:05:29,558 --> 00:05:34,960 contour plot I'm looking at setting this equal to some constant. 73 00:05:36,930 --> 00:05:41,750 And changing the value of that constant to get these different contours, 74 00:05:41,750 --> 00:05:43,320 the different colors that I'm seeing here. 75 00:05:44,350 --> 00:05:49,770 Well, what shape is w0 squared plus w1 squared equal to constant? 76 00:05:49,770 --> 00:05:52,030 That is exactly the equation of a circle. 77 00:05:55,650 --> 00:05:59,640 So we see that the circle is centered about zero and 78 00:06:01,560 --> 00:06:05,040 so each one of these curves here, just to be clear, 79 00:06:09,610 --> 00:06:15,260 is my two norm of w squared, 80 00:06:15,260 --> 00:06:17,890 equal to, let's say constant one. 81 00:06:17,890 --> 00:06:25,813 This next circle is the two norm of w squared equal to sum constant two, 82 00:06:25,813 --> 00:06:31,280 which is greater than constant one, and so on. 83 00:06:31,280 --> 00:06:37,630 And again, if I look at any point w0 w1 and 84 00:06:37,630 --> 00:06:43,230 some other point w0 prime w1 prime, 85 00:06:43,230 --> 00:06:52,550 these things have the same norm of the w vector squared. 86 00:06:52,550 --> 00:06:54,820 And that's true for all points around the circle. 87 00:06:55,830 --> 00:06:59,940 So let's say I'm just trying to minimize my two norm. 88 00:07:01,180 --> 00:07:02,440 What's the solution to that? 89 00:07:04,550 --> 00:07:08,900 Well I'm going to jump down these contours to my minimum. 90 00:07:08,900 --> 00:07:12,620 And my minimum, let me do it directly in red this time. 91 00:07:12,620 --> 00:07:15,130 My minimum, oops, that didn't switch colors. 92 00:07:16,390 --> 00:07:17,870 Directly in red. 93 00:07:17,870 --> 00:07:21,583 My minimum is setting w0 and w1 equal to zero. 94 00:07:21,583 --> 00:07:27,390 So this is min/w0 w1, my two norm, 95 00:07:27,390 --> 00:07:34,648 which I'll write explicitly in this 2D case, 96 00:07:34,648 --> 00:07:39,365 is w0 squared + w1 squared, 97 00:07:39,365 --> 00:07:43,000 and the solution is 0. 98 00:07:43,000 --> 00:07:48,100 Okay, so this would be our ridge regression 99 00:07:48,100 --> 00:07:53,090 solution if lambda, the waiting on this 2 norm, were infinity. 100 00:07:53,090 --> 00:07:55,738 We talked about that before. 101 00:07:55,738 --> 00:07:59,609 [MUSIC]