[MUSIC] Okay, so how are we going to go about this feature selection task? Well one option we have, is the obvious choice, which is to search over every possible combination of features we might want to include in our model and look at the performance of each of those models. And that's exactly what the all subsets algorithm does and we're going to describe this now. Okay, well the all subsets algorithm starts by considering a model with absolutely no features in it. Okay, removing all these features we might have in our house and saying what's the performance of that model? So just to be clear, start with no features and there's still a model for no features. So the model for no features, remember, is just that our observation is simply noise. Okay, so we can assess the performance of this model on our training data, and there is some training error associated with a model with no features. So, we're going to plot that point. And then the next thing we're going to do, is we're going to search over ever possible model of just one feature. So let's say to start with, we consider a model which is number of bedrooms. And here we're gonna plot the training error of model fit just with number of bedrooms as the feature. Then we're gonna say, well, what's the training error of a model fit just with number of bathrooms, square feet, square feet of the lot, and cycle through each one of our possible features. And at the end of this we can say out of all models with just one feature, which one fit the training data the best? And in this case it happened to be the model that included square feet living. So we've seen that before, that that's a very relevant feature. Okay, so we're gonna highlight, this is the best fitting model. With only one feature. And we're gonna keep track of this model and we can discard all the other ones. Then we're gonna go and search over all models with 2 features. So search over all combinations, Of 2 features. And we're gonna figure out which one of these has the lowest training error, keep track of that model. And that happens to be a model that has number of bedrooms and number of bathrooms. And that might make sense, because when we're going to search for a property, often someone will say, I want a three bedroom house with two bathrooms. So that might be a reasonable choice for the best model, which is two features. And I wanna emphasize that the best model, which is two features, doesn't have to contain any of the features that were contained in the best model of one feature. So here, our best model with two features has number of bedrooms, number of bathrooms. Whereas in contrast, our best model with just one feature has square feet living. Okay, so these aren't necessarily nested. So maybe I'll write this explicitly. Best model of size k need not contain features. Of best model of size k minus one. Okay, so hopefully that's clear. And we're gonna continue our procedure searching over all models then with 3 features, all models with 4 features, 5 features, and at some point what we're gonna get to is, we're gonna get to a model that has capital D features. That's all of the features that we include, and there is only one such model. So it's just one point here. Then, what we can do is we can draw this line, which represents, the set is connecting the points which represent the set of all best possible models, each with a given number of features. Then the question is, which of these models of these best models with k features do we want to use for our predictions? Well hopefully it's clear from this course, as well as from this slide, that we don't just wanna choose the model with the lowest training error, because as we know at this point, as we increase model complexity, our training error is gonna go down, and that's what we're seeing in this plot. So instead, it's the same type of choices that we've had previously in this course for choosing between models of various complexity. One choice, is if you have enough data you can access performance on the validation set. That's separate from your training and test set. We also talked about doing cross validation. And in this case there many other metrics we can look at for how to think about penalizing model complexity. There are things called BIC and a long list of other methods that people have for choosing amongst these different models. But we're not going to go through the details of that, for our course we're gonna focus on this notion of error on the validation set. [MUSIC]