1 00:00:00,000 --> 00:00:04,794 [MUSIC] 2 00:00:04,794 --> 00:00:08,700 So let's generate some data and fit polynomials of increasing degrees and 3 00:00:08,700 --> 00:00:11,550 see what happens to the estimated coefficients. 4 00:00:11,550 --> 00:00:15,260 So to start with, let's just import some libraries that are gonna be useful. 5 00:00:15,260 --> 00:00:21,350 And then we're gonna just create 30 different x values. 6 00:00:21,350 --> 00:00:25,290 So, in the end we're gonna create a data set with 30 observations. 7 00:00:25,290 --> 00:00:27,640 And then what we're gonna do is, 8 00:00:27,640 --> 00:00:31,570 we're gonna compute the value of this sign functions. 9 00:00:31,570 --> 00:00:36,350 So, evaluate the sign function at these 30 x values. 10 00:00:36,350 --> 00:00:39,430 But of course when we're doing our analysis we're going to assume we have 11 00:00:39,430 --> 00:00:44,530 noisy data so we're gonna add noise to these 12 00:00:44,530 --> 00:00:47,870 true sign values to get our actual observations. 13 00:00:47,870 --> 00:00:49,850 So here we're just adding noise. 14 00:00:49,850 --> 00:00:52,800 And then we're gonna put this into an S frame. 15 00:00:54,210 --> 00:00:55,972 And so here's what our data looks like. 16 00:00:55,972 --> 00:01:00,840 We have a set of X values, and our corresponding Y values, 17 00:01:02,230 --> 00:01:07,040 but of course, it's easier to just visualize what this data set looks like. 18 00:01:07,040 --> 00:01:09,830 So let's just make a plot of X versus Y. 19 00:01:09,830 --> 00:01:14,700 So, here you can see that there's an underlying trend like this so 20 00:01:14,700 --> 00:01:17,470 the true trend like we talked about is this sin function. 21 00:01:17,470 --> 00:01:23,670 So it's going up and coming down here and these black dots are our observed values. 22 00:01:25,140 --> 00:01:29,880 Okay, so now let's get to our polynomial regression task and 23 00:01:29,880 --> 00:01:33,050 to start with what we're gonna do is we're first just gonna. 24 00:01:33,050 --> 00:01:35,570 Define our polynomial features so 25 00:01:35,570 --> 00:01:39,470 what we're doing with this function polynomial underscore features is we're 26 00:01:39,470 --> 00:01:44,720 taking our S frame and then we're just gonna make a copy of that S frame. 27 00:01:44,720 --> 00:01:47,660 And for any degree polynomial that we're considering, 28 00:01:47,660 --> 00:01:53,500 we're gonna manipulate the S frame to include extra columns that are powers 29 00:01:53,500 --> 00:01:58,690 of X based on whatever degree we've specified for that polynomial. 30 00:01:58,690 --> 00:02:03,750 So that's what this function does right here and then the very important function 31 00:02:03,750 --> 00:02:10,430 is polynomial regression which is implementing our multiple regression model 32 00:02:10,430 --> 00:02:16,670 using the features specified by this polynomial underscore features function. 33 00:02:16,670 --> 00:02:21,680 So, again, for simplicity we're just using graphlab create and we're gonna 34 00:02:21,680 --> 00:02:26,840 use the dot linear regression function where the features we're specifying 35 00:02:26,840 --> 00:02:32,970 are just the powers specified by the degree of the polynomial we're looking at. 36 00:02:32,970 --> 00:02:38,040 And then our target is our observation Y, and then there are these two 37 00:02:38,040 --> 00:02:43,510 terms our l2 penalty and l1 penalty that we set to be equal to zero. 38 00:02:43,510 --> 00:02:48,570 So this module on ridge regression is gonna be all about this l2 penalty, 39 00:02:48,570 --> 00:02:50,220 and we're gonna get to that. 40 00:02:50,220 --> 00:02:56,770 And then the next module is gonna be all about this l1 penalty, 41 00:02:56,770 --> 00:03:02,005 but for now, let's just understand that if we set these values to 42 00:03:02,005 --> 00:03:07,620 zero we just return to our standard least squares regression. 43 00:03:07,620 --> 00:03:13,580 Okay, so that's what our polynomial regression function is doing. 44 00:03:13,580 --> 00:03:18,420 And the next function we're gonna define allows us to plot our fit. 45 00:03:19,950 --> 00:03:24,630 And finally we're gonna define a function that allows us to, 46 00:03:24,630 --> 00:03:28,498 in a very nice way, print the coefficients of our polynomial regression. 47 00:03:28,498 --> 00:03:33,130 And for this we're gonna use this numpy library because that allows for 48 00:03:33,130 --> 00:03:36,910 this really pretty printing of our polynomial. 49 00:03:38,400 --> 00:03:41,490 Okay, so now we're gonna use all these functions again and 50 00:03:41,490 --> 00:03:47,930 again as we explore different degrees of polynomials fit to this data. 51 00:03:47,930 --> 00:03:50,840 So to start with let's just consider fitting a very low order 52 00:03:50,840 --> 00:03:53,040 degree two polynomial. 53 00:03:53,040 --> 00:03:59,000 So first we're gonna do our polynomial regression fit taking our s-frame, 54 00:03:59,000 --> 00:04:03,050 which we call data, and specifying that the degree is two for 55 00:04:03,050 --> 00:04:05,100 this polynomial regression. 56 00:04:05,100 --> 00:04:08,670 Then let's look at the coefficients that we've estimated. 57 00:04:09,850 --> 00:04:13,596 And here's where we've done this really nice printing of these coefficients using 58 00:04:13,596 --> 00:04:19,020 that NumPy library where here what we see is that 59 00:04:19,020 --> 00:04:24,700 we have some coefficient on X squared, a coefficient on X, 60 00:04:24,700 --> 00:04:29,140 and just our intercept term here. 61 00:04:29,140 --> 00:04:32,730 And these values are, I don't know how to call them reasonable or 62 00:04:32,730 --> 00:04:37,400 not reasonable, but they're relatively small numbers. 63 00:04:37,400 --> 00:04:41,710 Number we can kind of appreciate, number like five and four and 64 00:04:41,710 --> 00:04:43,820 something close to zero. 65 00:04:43,820 --> 00:04:48,360 And now let's plot what our estimated fit looks like. 66 00:04:48,360 --> 00:04:49,330 And this looks pretty good. 67 00:04:49,330 --> 00:04:52,950 It's a nice smooth curve. 68 00:04:52,950 --> 00:04:55,320 Goes between the values pretty well, 69 00:04:55,320 --> 00:05:00,180 and in between values you'd imagine believing what this fit is. 70 00:05:00,180 --> 00:05:04,980 But now let's go to a slightly higher degree polynomial just 71 00:05:04,980 --> 00:05:07,860 a order four polynomial. 72 00:05:07,860 --> 00:05:11,200 And here we're doing all the steps at once, where we're going to fit our model, 73 00:05:11,200 --> 00:05:13,580 print the coefficients and plot the fit. 74 00:05:13,580 --> 00:05:17,260 And so if we look at the estimated coefficients of our 75 00:05:17,260 --> 00:05:22,010 fourth order polynomial, we see that the coefficients have increased in magnitude. 76 00:05:22,010 --> 00:05:24,530 We have numbers like 23 and 53 and a 35. 77 00:05:24,530 --> 00:05:29,170 And the fit is looking a bit wigglier. 78 00:05:29,170 --> 00:05:35,700 Still actually looks pretty reasonable but now let's get to our degree 16 polynomial. 79 00:05:35,700 --> 00:05:38,270 So remember we only have 30 observations and 80 00:05:38,270 --> 00:05:41,920 we're trying to fit 16 order polynomial. 81 00:05:41,920 --> 00:05:42,900 So what happens here? 82 00:05:45,260 --> 00:05:46,640 So in this case, 83 00:05:48,100 --> 00:05:52,616 we see that the coefficients have just become really, really massive. 84 00:05:52,616 --> 00:06:00,830 Here we have 2.583 times 10 to the 6 and here 1.295 times 10 to the 7th. 85 00:06:01,980 --> 00:06:05,350 So these are really, really, really large numbers. 86 00:06:05,350 --> 00:06:06,350 And let's look at the fit. 87 00:06:10,045 --> 00:06:14,640 As expected, this fit is also really wiggly and crazy. 88 00:06:14,640 --> 00:06:19,830 And we probably don't believe that this is what's really going on in this data. 89 00:06:19,830 --> 00:06:25,810 So this is an example pictorially of an overfit function. 90 00:06:25,810 --> 00:06:30,890 But what we see in the take-home message from this demo is the fact that when we're 91 00:06:30,890 --> 00:06:36,400 in these situations of being very overfit, then we get these very, 92 00:06:36,400 --> 00:06:40,500 very, very large estimated coefficients associated with our model. 93 00:06:41,600 --> 00:06:45,580 So yeah, whoa, these coefficients are crazy. 94 00:06:49,250 --> 00:06:53,916 So what ridge regression is gonna do is it's going to quantify 95 00:06:53,916 --> 00:06:59,491 overfitting through this measure of the magnitude of the coefficients. 96 00:06:59,491 --> 00:07:04,209 [MUSIC]