1 00:00:00,007 --> 00:00:04,407 [MUSIC] 2 00:00:04,407 --> 00:00:06,807 For any specific value of lambda, 3 00:00:06,807 --> 00:00:12,260 we get some balance between this residual sum of squares, and this two norm. 4 00:00:12,260 --> 00:00:14,120 And so what I'm gonna do, 5 00:00:14,120 --> 00:00:18,720 is in this movie, I'm gonna add these two contour plots together. 6 00:00:18,720 --> 00:00:21,746 I'm gonna add, so let me write this down. 7 00:00:21,746 --> 00:00:25,848 Add contour 8 00:00:25,848 --> 00:00:31,180 plots together, 9 00:00:31,180 --> 00:00:37,850 where I'm getting residual sum of squares of w plus lambda 2 norm of w. 10 00:00:40,000 --> 00:00:43,010 Where here the residual sum of squares were these ellipses, 11 00:00:44,460 --> 00:00:48,120 centered about my least squares solution, and there's to norm, 12 00:00:48,120 --> 00:00:51,460 where these circles centered about zero. 13 00:00:51,460 --> 00:00:55,818 And lambda is some weighting on how much I'm including 14 00:00:55,818 --> 00:00:59,970 that two norm penalty or the cost. 15 00:01:01,310 --> 00:01:05,643 And what I'm going to do is I'm going to show a movie as a function of lambda. 16 00:01:05,643 --> 00:01:11,137 So movie, function of 17 00:01:11,137 --> 00:01:16,303 increasing lambda, 18 00:01:16,303 --> 00:01:21,240 where I have my ellipses, and I'm weighting more and 19 00:01:21,240 --> 00:01:24,800 more heavily these contours that are coming from the circle. 20 00:01:24,800 --> 00:01:28,820 The circle terms from this two norm penalty. 21 00:01:28,820 --> 00:01:34,220 Okay, so this is the movie right here, and 22 00:01:35,520 --> 00:01:41,045 my lovely assistant Carlos, will click the mouse to play the movie. 23 00:01:41,045 --> 00:01:45,470 [LAUGH] Since I don't know how to control it from 24 00:01:45,470 --> 00:01:47,245 the tablet, unfortunately. 25 00:01:47,245 --> 00:01:50,510 [LAUGH] Thank you Fanna. 26 00:01:51,580 --> 00:01:55,680 That reference was probably lost on most people. 27 00:01:55,680 --> 00:01:59,550 And in doing so, you didn't get me describing the movie so 28 00:01:59,550 --> 00:02:00,850 let's watch it again. 29 00:02:00,850 --> 00:02:04,184 But what we see this x, let me be clear, 30 00:02:04,184 --> 00:02:08,339 that the x is going to mark the optimal solution, 31 00:02:13,133 --> 00:02:16,013 For a specific lambda and 32 00:02:16,013 --> 00:02:22,110 we're varying lambda so this x is gonna move. 33 00:02:22,110 --> 00:02:23,440 Okay where's the x gonna start? 34 00:02:23,440 --> 00:02:27,610 Well when lambda's equal to zero, we're starting out our lee squared solution and 35 00:02:27,610 --> 00:02:31,000 as lambda increases we know that as lambda goes to infinity 36 00:02:31,000 --> 00:02:32,690 the coefficients are gonna shrink to zero. 37 00:02:32,690 --> 00:02:36,500 But let's visualize the path that it takes as we increase lambda. 38 00:02:36,500 --> 00:02:40,240 Okay so let's play this movie again, gonna start it early square the solution and 39 00:02:40,240 --> 00:02:43,300 we see that the magnitude of our coefficients W0 and 40 00:02:43,300 --> 00:02:47,380 W1 are shrinking smaller and smaller towards zero. 41 00:02:47,380 --> 00:02:54,080 So, maybe we'll play that once more just to visualize this and what we say. 42 00:02:54,080 --> 00:02:56,510 Again, this was just the tail end of the movie. 43 00:02:56,510 --> 00:02:59,250 Is this shrinking magnitude of the coefficients? 44 00:03:00,740 --> 00:03:05,020 Carlos is very excited about this movie, so we're gonna watch it one more time. 45 00:03:05,020 --> 00:03:05,770 It's pretty cool. 46 00:03:05,770 --> 00:03:08,310 We've never actually seen somebody do this visualization. 47 00:03:08,310 --> 00:03:10,510 We think it's really intuitive. 48 00:03:10,510 --> 00:03:15,570 So again, as that land of penalty is increasing, 49 00:03:15,570 --> 00:03:17,699 the magnitude of the coefficients are getting shrunk. 50 00:03:18,700 --> 00:03:23,510 Okay, well now let's talk about what the solution looks like for 51 00:03:23,510 --> 00:03:24,820 a given value of lambda. 52 00:03:26,850 --> 00:03:30,260 Oops, sorry let me turn my pen on. 53 00:03:30,260 --> 00:03:38,440 So for a specific lambda value. 54 00:03:41,981 --> 00:03:45,090 We have some balance between residual sum of squares and 55 00:03:45,090 --> 00:03:47,920 the magnitude of our coefficients. 56 00:03:47,920 --> 00:03:51,380 Lambda's automatically doing some trade-off between the two. 57 00:03:51,380 --> 00:03:54,847 So some balance 58 00:03:58,283 --> 00:04:03,960 Between RSS and our two norm. 59 00:04:03,960 --> 00:04:10,820 And specifically, in this plot, this is our solution. 60 00:04:12,730 --> 00:04:18,880 So, it has some RSS that happens to be, five thousand two hundred fifteen, 61 00:04:18,880 --> 00:04:23,860 that's what the number on this contour is indicating and it has some tune arm, 62 00:04:23,860 --> 00:04:28,440 which has value 4.75, and so 63 00:04:28,440 --> 00:04:33,840 this lambda has chosen the specific trade off and we see that our solution 64 00:04:33,840 --> 00:04:37,820 is somewhere here, which has shrunk from where our least square solution was. 65 00:04:37,820 --> 00:04:42,030 Let's remember our least square solution was somewhere around here. 66 00:04:42,030 --> 00:04:44,960 And the optimal for lambda equals infinity was that zero. 67 00:04:44,960 --> 00:04:48,270 So it's somewhere in between these two values. 68 00:04:49,480 --> 00:04:53,426 And if we had chosen a different value of lambda, let's say a larger value of 69 00:04:53,426 --> 00:04:58,880 lambda We would of had a different solution. 70 00:04:58,880 --> 00:05:03,377 And when I'm drawing all these contours, what I'm saying is, 71 00:05:03,377 --> 00:05:07,890 let me just go back to the original one before this this drawing. 72 00:05:07,890 --> 00:05:11,720 What I'm saying is, every other point along this circle, 73 00:05:11,720 --> 00:05:14,420 has exactly the same residual sum of squares. 74 00:05:14,420 --> 00:05:19,858 But larger l2 norm of w and everywhere 75 00:05:19,858 --> 00:05:24,690 along this circle has exactly the same w2 norm, 76 00:05:26,030 --> 00:05:31,240 sorry, l2 norm of w, but it has larger residual sum of squares. 77 00:05:31,240 --> 00:05:35,660 So that's why this is the optimal trade off for this lambda. 78 00:05:36,950 --> 00:05:42,930 Then, like I drew here, if I chose a larger lambda, 79 00:05:42,930 --> 00:05:48,870 I will get a solution that preferred a smaller two norm and 80 00:05:48,870 --> 00:05:51,140 a larger residual sum of squares. 81 00:05:51,140 --> 00:05:57,434 So, this would be solution for 82 00:05:57,434 --> 00:06:02,290 a larger lambda value. 83 00:06:02,290 --> 00:06:06,109 Okay, so this is just a little visualization of what a ridge regression 84 00:06:06,109 --> 00:06:07,327 solution looks like. 85 00:06:07,327 --> 00:06:11,399 [MUSIC]