[MUSIC] Okay, so here's a summary of the large set of topics that we've covered in this course. We talked about a bunch of models including, different models of linear regression from simple regression to multiple regression. We talked about doing ridge regression and Lasso. And then, Nearest neighbors and kernel regression. And we also talked about some very important optimisation algorithms like Gradient descent and Coordinate descent. And really just this notion of what is optimization and how do you go about doing it. And then we talked about concepts that generalise well beyond regression. This include things like Loss functions, this very important concept of the bias variance trade-off. Talking about cross-validation, sparsity, overfitting, feature selection, model selection. And these are ideas that we're going to see in most of the courses in this specialization. So, we spent a lot of time, teaching the methods of this module, and now I've spent a lot of time summarizing what we learned, but I want to take a minute to talk about what we didn't cover in this course. So there are actually a few important topics that unfortunately, we didn't have time to go through in this course, and I want to highlight them here. One is the fact that in this course, we focus on just having a unit area output. Which, for example, was the value of a house or the sales price of a house. But of course you could have a multivariate output. And in cases where that multivariate output, where the dimensions are correlated, you need to do slightly more complicated things. But in contrast if you assume that each of these outputs. Are independent of each other, then you can just do the methods we described independently for each dimension. The other thing that we haven't covered yet, is this idea of what's called maximum likelihood estimation. We're gonna go through that in the classification course, but I wanna mention that in the context of regression, if you've heard of maximum likelihood estimation. It results in exactly the same objective we had with our minimizing our residual sum of squares. Assuming that your model has what are called normal or Gaussian errors, that this epsilon term that we've talked about. Remember Y equals WX plus epsilon. Well, that epsilon, if we assume it's normally distributed, or sometimes people say Gaussian distributed, then maximum likelihood estimation is exactly equivalent to what we've talked about in this course. And like I said, we'll learn more about maximum likelihood estimation in the classification course. But one really, really important thing that we didn't talk about in this course, which truthfully pains me, being a statistician, are statistical inferences. We just focused on what are called these ideas of point estimation. We just returned a W-hat value or estimated coefficients, but we didn't talk about any notion of what our measure of uncertainty about those estimated coefficients or our predictions. So, again, there's noise inherent to the data so we can think of having measures of uncertainty about our predictions or our estimated coefficients. So this is referred to as inference and it's a really important topic that we did not go through here. Another cool set of methods are what are called generalized linear models and we're actually gonna see an example of a generalized linear model in the class vacation course, so you will get to see this but I want to bring it up here. And what generalized linear models allow you to do, is form regressions when you have certain restrictions on your output. Like the output is always. Positive or bounded or positive and bounded. Or it's gonna be discreet value like were gonna talk about in the classification course. Just a yes or no response. We'll, if we're assuming that Gaussian, our errors are Gaussian, like they talk about with this maximum likelihood estimation or in this course what we talked about of having zero mean but the observations were equally likely to be above or below the true function and they're actually unbounded in how far they could be above or below that true function, well the regression models that we've talked about so far are inappropriate for forming predictions if those predicted values have these types of constraints or specific structures to them. And generalized linear models allow us to cope with certain types of these structures, very efficiently. Another really powerful tool that we didn't describe in this course is something called the Regression tree. And that's because we're gonna cover it in the classification course. Actually more generally, these methods are referred to as CART, which are Classification And Regression Trees. Because what you do is you form a tree. And that structure's the same whether we're looking at classification or regression. But, we're gonna focusing on describing these structures in the context of classification because they're a lot simpler to understand in that context. But I wanna emphasize that those same tools that we're gonna learn in the next course can be used in regression as well. And of course, there are lots and lots of other methods that we haven't described in this course. Regression has an extremely long history in statistics, so there are lots of things that are potentially of interest. But in this course, we really try to focus in on the main concepts that are useful in modern machine learning applications of regression.