1 00:00:00,000 --> 00:00:04,840 [MUSIC] 2 00:00:04,840 --> 00:00:09,200 So now let's talk about the complexity of this forward stepwise algorithm. 3 00:00:09,200 --> 00:00:12,250 And contrast it with the all subsets procedure. 4 00:00:12,250 --> 00:00:14,960 So how many models did we evaluate here? 5 00:00:14,960 --> 00:00:19,299 Well, in the first step, assuming that we have a total of D features, 6 00:00:19,299 --> 00:00:21,004 we searched over D models. 7 00:00:21,004 --> 00:00:26,140 Then, in the second step, we had D-1 models to search over. 8 00:00:26,140 --> 00:00:28,754 So all remaining features we might think about adding. 9 00:00:28,754 --> 00:00:33,879 And then the third step, we had D-2 features and so on. 10 00:00:33,879 --> 00:00:38,022 And a question is, how many steps did we take, how many of these and so 11 00:00:38,022 --> 00:00:39,030 ons did we have? 12 00:00:39,030 --> 00:00:40,870 Well, it depends. 13 00:00:40,870 --> 00:00:43,631 It depends on when we chose to stop, but 14 00:00:43,631 --> 00:00:48,820 at most we're gonna have D steps, which gets us to the full model. 15 00:00:48,820 --> 00:00:52,890 So, the complexity of this is O(D squared). 16 00:00:52,890 --> 00:00:58,587 So at most D steps each on the order of looking over D models. 17 00:00:58,587 --> 00:01:03,042 And that's gonna be much, much less than 2 to the D, 18 00:01:03,042 --> 00:01:08,500 which is what we had for all subsets, when D is reasonably large. 19 00:01:09,810 --> 00:01:13,500 Okay, so this procedure is gonna be significantly 20 00:01:13,500 --> 00:01:18,850 more computationally efficient or feasible than doing all subsets. 21 00:01:18,850 --> 00:01:22,550 Well this was just one of many possible choices you have for 22 00:01:22,550 --> 00:01:24,700 greedy algorithms for doing feature selection. 23 00:01:25,900 --> 00:01:29,315 As an example, instead of always starting from an empty model and growing and 24 00:01:29,315 --> 00:01:32,587 growing and growing, adding more features, you could do the opposite. 25 00:01:32,587 --> 00:01:36,310 You could start from a full model and then choose, 26 00:01:36,310 --> 00:01:40,045 at each iteration, which feature to eliminate. 27 00:01:40,045 --> 00:01:42,890 But you could also think about combining these steps. 28 00:01:42,890 --> 00:01:47,749 So you could do a forward procedure, where you search over which feature to add, but 29 00:01:47,749 --> 00:01:51,940 then every so often you could go back and think about deleting features. 30 00:01:53,210 --> 00:01:58,328 Because like we said, you might in the forward process, add a feature that later, 31 00:01:58,328 --> 00:02:02,070 once you've added another feature, is no longer relevant. 32 00:02:02,070 --> 00:02:04,368 So, as an example of this, maybe, and 33 00:02:04,368 --> 00:02:07,796 it's not completely probably true in this application. 34 00:02:07,796 --> 00:02:12,404 But maybe once you've added number of bedrooms and number of baths, maybe square 35 00:02:12,404 --> 00:02:16,542 feet is no longer so relevant since these other things can act as a proxy for 36 00:02:16,542 --> 00:02:19,500 square feet, or something like that. 37 00:02:19,500 --> 00:02:22,390 And there are lots and lots of other variants of algorithms people have 38 00:02:22,390 --> 00:02:26,790 proposed with different metrics for when you add or remove features. 39 00:02:26,790 --> 00:02:30,063 And different ways to search over the features because this problem of feature 40 00:02:30,063 --> 00:02:31,462 selection is really important. 41 00:02:31,462 --> 00:02:32,892 It's not a new problem, and so 42 00:02:32,892 --> 00:02:35,753 a lot of different procedures have been proposed to solve it. 43 00:02:35,753 --> 00:02:39,569 [MUSIC]