1 00:00:00,000 --> 00:00:04,820 [MUSIC] 2 00:00:04,820 --> 00:00:08,991 Okay, so here's a summary of the large set of topics that we've covered in this 3 00:00:08,991 --> 00:00:10,290 course. 4 00:00:10,290 --> 00:00:12,880 We talked about a bunch of models including, 5 00:00:12,880 --> 00:00:16,570 different models of linear regression from simple regression to multiple regression. 6 00:00:16,570 --> 00:00:19,500 We talked about doing ridge regression and Lasso. 7 00:00:19,500 --> 00:00:22,930 And then, Nearest neighbors and kernel regression. 8 00:00:22,930 --> 00:00:27,170 And we also talked about some very important optimisation algorithms like 9 00:00:27,170 --> 00:00:29,340 Gradient descent and Coordinate descent. 10 00:00:29,340 --> 00:00:31,870 And really just this notion of what is optimization and 11 00:00:31,870 --> 00:00:34,160 how do you go about doing it. 12 00:00:34,160 --> 00:00:38,980 And then we talked about concepts that generalise well beyond regression. 13 00:00:38,980 --> 00:00:41,410 This include things like Loss functions, 14 00:00:41,410 --> 00:00:45,200 this very important concept of the bias variance trade-off. 15 00:00:45,200 --> 00:00:48,594 Talking about cross-validation, sparsity, overfitting, 16 00:00:48,594 --> 00:00:51,480 feature selection, model selection. 17 00:00:51,480 --> 00:00:54,880 And these are ideas that we're going to see in most of the courses 18 00:00:54,880 --> 00:00:55,830 in this specialization. 19 00:00:57,560 --> 00:01:01,410 So, we spent a lot of time, teaching the methods of this module, and 20 00:01:01,410 --> 00:01:04,490 now I've spent a lot of time summarizing what we learned, but 21 00:01:04,490 --> 00:01:08,360 I want to take a minute to talk about what we didn't cover in this course. 22 00:01:08,360 --> 00:01:11,670 So there are actually a few important topics that unfortunately, 23 00:01:11,670 --> 00:01:15,820 we didn't have time to go through in this course, and I want to highlight them here. 24 00:01:15,820 --> 00:01:21,230 One is the fact that in this course, we focus on just having a unit area output. 25 00:01:21,230 --> 00:01:26,040 Which, for example, was the value of a house or the sales price of a house. 26 00:01:26,040 --> 00:01:30,170 But of course you could have a multivariate output. 27 00:01:30,170 --> 00:01:34,020 And in cases where that multivariate output, where the dimensions 28 00:01:34,020 --> 00:01:38,210 are correlated, you need to do slightly more complicated things. 29 00:01:38,210 --> 00:01:41,090 But in contrast if you assume that each of these outputs. 30 00:01:41,090 --> 00:01:44,740 Are independent of each other, then you can just do 31 00:01:44,740 --> 00:01:48,050 the methods we described independently for each dimension. 32 00:01:48,050 --> 00:01:50,000 The other thing that we haven't covered yet, 33 00:01:50,000 --> 00:01:53,290 is this idea of what's called maximum likelihood estimation. 34 00:01:53,290 --> 00:01:56,500 We're gonna go through that in the classification course, but 35 00:01:56,500 --> 00:01:59,870 I wanna mention that in the context of regression, 36 00:01:59,870 --> 00:02:02,710 if you've heard of maximum likelihood estimation. 37 00:02:02,710 --> 00:02:08,110 It results in exactly the same objective we had with our 38 00:02:08,110 --> 00:02:11,050 minimizing our residual sum of squares. 39 00:02:11,050 --> 00:02:15,280 Assuming that your model has what are called normal or Gaussian errors, 40 00:02:15,280 --> 00:02:17,740 that this epsilon term that we've talked about. 41 00:02:17,740 --> 00:02:22,010 Remember Y equals WX plus epsilon. 42 00:02:23,060 --> 00:02:25,840 Well, that epsilon, if we assume it's normally distributed, 43 00:02:25,840 --> 00:02:30,310 or sometimes people say Gaussian distributed, then maximum likelihood 44 00:02:30,310 --> 00:02:34,180 estimation is exactly equivalent to what we've talked about in this course. 45 00:02:34,180 --> 00:02:34,830 And like I said, 46 00:02:34,830 --> 00:02:38,939 we'll learn more about maximum likelihood estimation in the classification course. 47 00:02:39,960 --> 00:02:44,280 But one really, really important thing that we didn't talk about in this course, 48 00:02:44,280 --> 00:02:49,170 which truthfully pains me, being a statistician, are statistical inferences. 49 00:02:49,170 --> 00:02:52,460 We just focused on what are called these ideas of point estimation. 50 00:02:52,460 --> 00:02:56,880 We just returned a W-hat value or estimated coefficients, but 51 00:02:56,880 --> 00:03:01,950 we didn't talk about any notion of what our measure of uncertainty about those 52 00:03:01,950 --> 00:03:04,600 estimated coefficients or our predictions. 53 00:03:04,600 --> 00:03:09,770 So, again, there's noise inherent to the data so we can think of having 54 00:03:09,770 --> 00:03:15,560 measures of uncertainty about our predictions or our estimated coefficients. 55 00:03:15,560 --> 00:03:17,640 So this is referred to as inference and 56 00:03:17,640 --> 00:03:20,110 it's a really important topic that we did not go through here. 57 00:03:21,120 --> 00:03:24,980 Another cool set of methods are what are called generalized linear models and 58 00:03:24,980 --> 00:03:28,050 we're actually gonna see an example of a generalized linear model in the class 59 00:03:28,050 --> 00:03:33,620 vacation course, so you will get to see this but I want to bring it up here. 60 00:03:33,620 --> 00:03:36,390 And what generalized linear models allow you to do, 61 00:03:36,390 --> 00:03:41,720 is form regressions when you have certain restrictions on your output. 62 00:03:41,720 --> 00:03:43,530 Like the output is always. 63 00:03:43,530 --> 00:03:46,250 Positive or bounded or positive and bounded. 64 00:03:46,250 --> 00:03:48,320 Or it's gonna be discreet value like were gonna talk 65 00:03:48,320 --> 00:03:49,780 about in the classification course. 66 00:03:49,780 --> 00:03:52,760 Just a yes or no response. 67 00:03:52,760 --> 00:03:56,550 We'll, if we're assuming that Gaussian, our errors are Gaussian, 68 00:03:56,550 --> 00:03:59,830 like they talk about with this maximum likelihood estimation or 69 00:03:59,830 --> 00:04:03,130 in this course what we talked about of having zero mean but 70 00:04:03,130 --> 00:04:08,690 the observations were equally likely to be above or below the true function and 71 00:04:08,690 --> 00:04:12,590 they're actually unbounded in how far they could be above or below that true 72 00:04:12,590 --> 00:04:17,220 function, well the regression models that we've talked about so 73 00:04:17,220 --> 00:04:21,410 far are inappropriate for forming predictions if those predicted values have 74 00:04:21,410 --> 00:04:23,990 these types of constraints or specific structures to them. 75 00:04:25,140 --> 00:04:27,930 And generalized linear models allow us to cope with 76 00:04:27,930 --> 00:04:31,220 certain types of these structures, very efficiently. 77 00:04:31,220 --> 00:04:34,220 Another really powerful tool that we 78 00:04:34,220 --> 00:04:37,120 didn't describe in this course is something called the Regression tree. 79 00:04:37,120 --> 00:04:40,290 And that's because we're gonna cover it in the classification course. 80 00:04:40,290 --> 00:04:43,290 Actually more generally, these methods are referred to as CART, 81 00:04:43,290 --> 00:04:47,220 which are Classification And Regression Trees. 82 00:04:47,220 --> 00:04:50,250 Because what you do is you form a tree. 83 00:04:50,250 --> 00:04:53,190 And that structure's the same whether we're looking at classification or 84 00:04:53,190 --> 00:04:53,840 regression. 85 00:04:55,490 --> 00:04:59,620 But, we're gonna focusing on describing these structures in the context 86 00:04:59,620 --> 00:05:03,680 of classification because they're a lot simpler to understand in that context. 87 00:05:03,680 --> 00:05:06,520 But I wanna emphasize that those same tools that we're gonna 88 00:05:06,520 --> 00:05:09,470 learn in the next course can be used in regression as well. 89 00:05:10,590 --> 00:05:11,950 And of course, there are lots and 90 00:05:11,950 --> 00:05:16,350 lots of other methods that we haven't described in this course. 91 00:05:16,350 --> 00:05:21,280 Regression has an extremely long history in statistics, so 92 00:05:21,280 --> 00:05:24,280 there are lots of things that are potentially of interest. 93 00:05:24,280 --> 00:05:28,880 But in this course, we really try to focus in on the main concepts that are useful 94 00:05:28,880 --> 00:05:31,640 in modern machine learning applications of regression.