1
00:00:00,000 --> 00:00:04,500
[MUSIC]

2
00:00:04,500 --> 00:00:07,748
Well, this all subsets
algorithm might seem great or

3
00:00:07,748 --> 00:00:10,923
at least pretty straight
forward to implement, but

4
00:00:10,923 --> 00:00:14,780
a question is what's the complexity
of running all subsets?

5
00:00:14,780 --> 00:00:17,270
How many models did we have to evaluate?

6
00:00:17,270 --> 00:00:22,250
Well, clearly what we evaluated all
models, but let's quantify what that is.

7
00:00:22,250 --> 00:00:25,050
So we looked at the model
that was just noise.

8
00:00:25,050 --> 00:00:30,140
We looked at the model with just the first
feature, second feature, all the way up to

9
00:00:30,140 --> 00:00:34,080
the model with just the first two
features, the second two features and

10
00:00:34,080 --> 00:00:39,420
every possible model up to
the full model of all D features.

11
00:00:39,420 --> 00:00:43,300
And what we can do is we can index each
one of these models that we searched over

12
00:00:44,330 --> 00:00:46,440
by a feature vector.

13
00:00:46,440 --> 00:00:51,140
And this feature vector is going to say,
so for

14
00:00:51,140 --> 00:00:57,079
feature one, feature two,
all the way up to feature D,

15
00:01:00,340 --> 00:01:05,680
what we're gonna enter is zero if no,

16
00:01:05,680 --> 00:01:11,340
that feature is not in the model, and
one if yes, that feature is in the model.

17
00:01:11,340 --> 00:01:15,390
So it's just gonna be a binary vector
indicating which features are present.

18
00:01:15,390 --> 00:01:17,390
So, in the case of just noise, or

19
00:01:17,390 --> 00:01:19,810
no features we have zeros
along this whole vector.

20
00:01:19,810 --> 00:01:22,500
In the case of just the first model,

21
00:01:22,500 --> 00:01:25,570
I made the first feature be included in
the model, we're just gonna have a one

22
00:01:25,570 --> 00:01:30,310
in that first feature location,
and zeros everywhere else.

23
00:01:30,310 --> 00:01:37,840
I guess for consistency, let me index
this as feature zero, feature one.

24
00:01:37,840 --> 00:01:39,640
All the way up to feature D.

25
00:01:39,640 --> 00:01:46,675
Okay, and we're gonna go through this
entire set of possible feature vectors,

26
00:01:46,675 --> 00:01:51,649
and how many choices are there for
the first entry, two.

27
00:01:51,649 --> 00:01:55,920
How many for the second, two,
two, two choices for every entry.

28
00:01:55,920 --> 00:01:57,640
And how many entries are there?

29
00:01:57,640 --> 00:02:02,780
Well, with my new indexing,
instead of D there's really D plus one.

30
00:02:02,780 --> 00:02:05,670
That's just a little notational choice.

31
00:02:08,460 --> 00:02:12,600
And I did a little back of the envelope
calculation for a couple choices of D.

32
00:02:13,740 --> 00:02:18,720
So for example, if we had a total of eight
different features we were looking over,

33
00:02:20,050 --> 00:02:22,660
then we would have to
search over 256 models.

34
00:02:22,660 --> 00:02:26,270
That actually might be okay.

35
00:02:26,270 --> 00:02:31,370
But if we had 30 features,
all of a sudden we have to search over 1

36
00:02:31,370 --> 00:02:36,440
billion some number of different models.

37
00:02:36,440 --> 00:02:37,980
And if we have 1,000 features,

38
00:02:37,980 --> 00:02:42,470
which really is not that many in
applications we look at these days,

39
00:02:42,470 --> 00:02:45,260
all of a sudden we have to search
over 1.07 times 10 to the 301.

40
00:02:45,260 --> 00:02:50,430
And for the example I gave
with 100 billion features,

41
00:02:50,430 --> 00:02:51,620
I don't even know what that number is.

42
00:02:51,620 --> 00:02:53,020
Well, I'm sure I could go and
compute it, but

43
00:02:53,020 --> 00:02:56,590
I didn't bother and
it's clearly just huge.

44
00:02:56,590 --> 00:03:00,250
So, what we can see is that typically, and

45
00:03:00,250 --> 00:03:02,430
in most situations we're
faced with these days.

46
00:03:02,430 --> 00:03:06,452
It's just computationally prohibitive
to do this all subset search.

47
00:03:06,452 --> 00:03:11,239
[MUSIC]