1
00:00:00,000 --> 00:00:04,840
[MUSIC]

2
00:00:04,840 --> 00:00:09,200
So now let's talk about the complexity
of this forward stepwise algorithm.

3
00:00:09,200 --> 00:00:12,250
And contrast it with
the all subsets procedure.

4
00:00:12,250 --> 00:00:14,960
So how many models did we evaluate here?

5
00:00:14,960 --> 00:00:19,299
Well, in the first step, assuming
that we have a total of D features,

6
00:00:19,299 --> 00:00:21,004
we searched over D models.

7
00:00:21,004 --> 00:00:26,140
Then, in the second step,
we had D-1 models to search over.

8
00:00:26,140 --> 00:00:28,754
So all remaining features we
might think about adding.

9
00:00:28,754 --> 00:00:33,879
And then the third step,
we had D-2 features and so on.

10
00:00:33,879 --> 00:00:38,022
And a question is, how many steps did
we take, how many of these and so

11
00:00:38,022 --> 00:00:39,030
ons did we have?

12
00:00:39,030 --> 00:00:40,870
Well, it depends.

13
00:00:40,870 --> 00:00:43,631
It depends on when we chose to stop, but

14
00:00:43,631 --> 00:00:48,820
at most we're gonna have D steps,
which gets us to the full model.

15
00:00:48,820 --> 00:00:52,890
So, the complexity of
this is O(D squared).

16
00:00:52,890 --> 00:00:58,587
So at most D steps each on
the order of looking over D models.

17
00:00:58,587 --> 00:01:03,042
And that's gonna be much,
much less than 2 to the D,

18
00:01:03,042 --> 00:01:08,500
which is what we had for all subsets,
when D is reasonably large.

19
00:01:09,810 --> 00:01:13,500
Okay, so
this procedure is gonna be significantly

20
00:01:13,500 --> 00:01:18,850
more computationally efficient or
feasible than doing all subsets.

21
00:01:18,850 --> 00:01:22,550
Well this was just one of many
possible choices you have for

22
00:01:22,550 --> 00:01:24,700
greedy algorithms for
doing feature selection.

23
00:01:25,900 --> 00:01:29,315
As an example, instead of always starting
from an empty model and growing and

24
00:01:29,315 --> 00:01:32,587
growing and growing, adding more features,
you could do the opposite.

25
00:01:32,587 --> 00:01:36,310
You could start from a full model and
then choose,

26
00:01:36,310 --> 00:01:40,045
at each iteration,
which feature to eliminate.

27
00:01:40,045 --> 00:01:42,890
But you could also think
about combining these steps.

28
00:01:42,890 --> 00:01:47,749
So you could do a forward procedure, where
you search over which feature to add, but

29
00:01:47,749 --> 00:01:51,940
then every so often you could go back and
think about deleting features.

30
00:01:53,210 --> 00:01:58,328
Because like we said, you might in the
forward process, add a feature that later,

31
00:01:58,328 --> 00:02:02,070
once you've added another feature,
is no longer relevant.

32
00:02:02,070 --> 00:02:04,368
So, as an example of this, maybe, and

33
00:02:04,368 --> 00:02:07,796
it's not completely probably
true in this application.

34
00:02:07,796 --> 00:02:12,404
But maybe once you've added number of
bedrooms and number of baths, maybe square

35
00:02:12,404 --> 00:02:16,542
feet is no longer so relevant since these
other things can act as a proxy for

36
00:02:16,542 --> 00:02:19,500
square feet, or something like that.

37
00:02:19,500 --> 00:02:22,390
And there are lots and lots of other
variants of algorithms people have

38
00:02:22,390 --> 00:02:26,790
proposed with different metrics for
when you add or remove features.

39
00:02:26,790 --> 00:02:30,063
And different ways to search over the
features because this problem of feature

40
00:02:30,063 --> 00:02:31,462
selection is really important.

41
00:02:31,462 --> 00:02:32,892
It's not a new problem, and so

42
00:02:32,892 --> 00:02:35,753
a lot of different procedures
have been proposed to solve it.

43
00:02:35,753 --> 00:02:39,569
[MUSIC]