1 00:00:00,000 --> 00:00:04,500 [MUSIC] 2 00:00:04,500 --> 00:00:07,748 Well, this all subsets algorithm might seem great or 3 00:00:07,748 --> 00:00:10,923 at least pretty straight forward to implement, but 4 00:00:10,923 --> 00:00:14,780 a question is what's the complexity of running all subsets? 5 00:00:14,780 --> 00:00:17,270 How many models did we have to evaluate? 6 00:00:17,270 --> 00:00:22,250 Well, clearly what we evaluated all models, but let's quantify what that is. 7 00:00:22,250 --> 00:00:25,050 So we looked at the model that was just noise. 8 00:00:25,050 --> 00:00:30,140 We looked at the model with just the first feature, second feature, all the way up to 9 00:00:30,140 --> 00:00:34,080 the model with just the first two features, the second two features and 10 00:00:34,080 --> 00:00:39,420 every possible model up to the full model of all D features. 11 00:00:39,420 --> 00:00:43,300 And what we can do is we can index each one of these models that we searched over 12 00:00:44,330 --> 00:00:46,440 by a feature vector. 13 00:00:46,440 --> 00:00:51,140 And this feature vector is going to say, so for 14 00:00:51,140 --> 00:00:57,079 feature one, feature two, all the way up to feature D, 15 00:01:00,340 --> 00:01:05,680 what we're gonna enter is zero if no, 16 00:01:05,680 --> 00:01:11,340 that feature is not in the model, and one if yes, that feature is in the model. 17 00:01:11,340 --> 00:01:15,390 So it's just gonna be a binary vector indicating which features are present. 18 00:01:15,390 --> 00:01:17,390 So, in the case of just noise, or 19 00:01:17,390 --> 00:01:19,810 no features we have zeros along this whole vector. 20 00:01:19,810 --> 00:01:22,500 In the case of just the first model, 21 00:01:22,500 --> 00:01:25,570 I made the first feature be included in the model, we're just gonna have a one 22 00:01:25,570 --> 00:01:30,310 in that first feature location, and zeros everywhere else. 23 00:01:30,310 --> 00:01:37,840 I guess for consistency, let me index this as feature zero, feature one. 24 00:01:37,840 --> 00:01:39,640 All the way up to feature D. 25 00:01:39,640 --> 00:01:46,675 Okay, and we're gonna go through this entire set of possible feature vectors, 26 00:01:46,675 --> 00:01:51,649 and how many choices are there for the first entry, two. 27 00:01:51,649 --> 00:01:55,920 How many for the second, two, two, two choices for every entry. 28 00:01:55,920 --> 00:01:57,640 And how many entries are there? 29 00:01:57,640 --> 00:02:02,780 Well, with my new indexing, instead of D there's really D plus one. 30 00:02:02,780 --> 00:02:05,670 That's just a little notational choice. 31 00:02:08,460 --> 00:02:12,600 And I did a little back of the envelope calculation for a couple choices of D. 32 00:02:13,740 --> 00:02:18,720 So for example, if we had a total of eight different features we were looking over, 33 00:02:20,050 --> 00:02:22,660 then we would have to search over 256 models. 34 00:02:22,660 --> 00:02:26,270 That actually might be okay. 35 00:02:26,270 --> 00:02:31,370 But if we had 30 features, all of a sudden we have to search over 1 36 00:02:31,370 --> 00:02:36,440 billion some number of different models. 37 00:02:36,440 --> 00:02:37,980 And if we have 1,000 features, 38 00:02:37,980 --> 00:02:42,470 which really is not that many in applications we look at these days, 39 00:02:42,470 --> 00:02:45,260 all of a sudden we have to search over 1.07 times 10 to the 301. 40 00:02:45,260 --> 00:02:50,430 And for the example I gave with 100 billion features, 41 00:02:50,430 --> 00:02:51,620 I don't even know what that number is. 42 00:02:51,620 --> 00:02:53,020 Well, I'm sure I could go and compute it, but 43 00:02:53,020 --> 00:02:56,590 I didn't bother and it's clearly just huge. 44 00:02:56,590 --> 00:03:00,250 So, what we can see is that typically, and 45 00:03:00,250 --> 00:03:02,430 in most situations we're faced with these days. 46 00:03:02,430 --> 00:03:06,452 It's just computationally prohibitive to do this all subset search. 47 00:03:06,452 --> 00:03:11,239 [MUSIC]