1 00:00:00,007 --> 00:00:04,913 [MUSIC] 2 00:00:04,913 --> 00:00:10,167 Well, we've motivated analytically how the coefficients that we get when solving 3 00:00:10,167 --> 00:00:15,349 this ridge regression problem are gonna change for different settings of lambda. 4 00:00:15,349 --> 00:00:20,822 Specifically, we saw that when lambda was 0, we get our least square solution. 5 00:00:20,822 --> 00:00:26,386 When lambda goes to infinity, we get very, very small coefficients approaching 0. 6 00:00:26,386 --> 00:00:29,410 And in between, we get some other set of coefficients and 7 00:00:29,410 --> 00:00:33,386 then we explore this experimentally in this polynomial regression demo. 8 00:00:33,386 --> 00:00:37,204 But one thing that's interesting to draw is what's 9 00:00:37,204 --> 00:00:41,420 called the coefficient path for ridge regression. 10 00:00:41,420 --> 00:00:45,920 Which shows as you vary lambda, all the way from 0 up 11 00:00:45,920 --> 00:00:50,537 towards infinity, how do the coefficients change? 12 00:00:50,537 --> 00:00:54,500 So how does my solution change as a function of lambda? 13 00:00:54,500 --> 00:00:57,701 And what we're doing in this plot here is we're drawing this for 14 00:00:57,701 --> 00:01:00,918 our housing example, where we have eight different features. 15 00:01:00,918 --> 00:01:05,522 Number of bedrooms, bathrooms, square feet of the living space, 16 00:01:05,522 --> 00:01:08,125 number of square feet of the lot size. 17 00:01:08,125 --> 00:01:12,262 Number of floors, the year the house was built, the year the house was renovated, 18 00:01:12,262 --> 00:01:14,640 and whether or not the property is waterfront. 19 00:01:16,520 --> 00:01:21,297 And for each one of these different inputs to our model are different, and 20 00:01:21,297 --> 00:01:24,688 these we're just gonna use as different features, 21 00:01:24,688 --> 00:01:28,483 we're drawing what the coefficients, so this would be, 22 00:01:32,302 --> 00:01:38,801 Coefficient value for 23 00:01:38,801 --> 00:01:44,970 square feet living. 24 00:01:44,970 --> 00:01:50,430 For some specific choice of lambda and how that coefficient varies as I increase 25 00:01:50,430 --> 00:01:54,640 lambda and I'm showing this for each one of the eight different coefficients. 26 00:01:54,640 --> 00:02:00,110 And I just want to briefly mention that in this figure, we've rescaled the features 27 00:02:00,110 --> 00:02:04,080 so that they all have unit norm so each one of these different inputs. 28 00:02:04,080 --> 00:02:08,095 That's why all of these coefficients are roughly on the same scale. 29 00:02:08,095 --> 00:02:10,079 They're roughly the same order of magnitude. 30 00:02:12,270 --> 00:02:17,864 Okay, and so what we see in this plot is, as lambda goes towards 0, 31 00:02:17,864 --> 00:02:22,394 or when it's specifically at 0, our solution here. 32 00:02:25,341 --> 00:02:29,978 The value of each of these coefficients, so each of these circles 33 00:02:29,978 --> 00:02:34,970 touching this line, this is gonna be my w hat least squares solution. 34 00:02:34,970 --> 00:02:40,471 And as I increase lambda out towards infinity, 35 00:02:40,471 --> 00:02:45,980 I see that my solution, w hat, approaches 0. 36 00:02:45,980 --> 00:02:49,820 There's a vector of coefficients is going to 0. 37 00:02:49,820 --> 00:02:55,620 And we haven't made lambda large enough in this plot to see them actually really, 38 00:02:55,620 --> 00:03:00,220 really, really, really close to 0, but you see the trend happening here. 39 00:03:00,220 --> 00:03:04,000 And then there's some sweet spot in this model, sorry not in this model, 40 00:03:04,000 --> 00:03:05,030 in this plot. 41 00:03:05,030 --> 00:03:08,270 Which we're gonna talk about later in this module. 42 00:03:11,050 --> 00:03:14,840 Whoops, I should draw it actually hitting some of these circles. 43 00:03:14,840 --> 00:03:16,722 One of these considered points. 44 00:03:24,743 --> 00:03:30,633 So this is gonna represent, erase this, 45 00:03:30,633 --> 00:03:36,705 this is gonna represent some lambda star. 46 00:03:36,705 --> 00:03:41,865 Which will be the value of lambda that we wanna use when we're selecting 47 00:03:41,865 --> 00:03:46,605 our specific regularized model to use for forming predictions. 48 00:03:46,605 --> 00:03:52,621 And we're gonna discuss how we choose which lambda to use later in the module. 49 00:03:52,621 --> 00:03:56,622 But for now, the main point of this plot is to realize that for 50 00:03:56,622 --> 00:03:59,917 every value of lambda, every slice of this plot, 51 00:03:59,917 --> 00:04:03,622 we get a different solution, a different w hat vector. 52 00:04:03,622 --> 00:04:07,899 [MUSIC]