1
00:00:04,770 --> 00:00:10,190
Okay, so let's think a little bit about
this form of this closed form solution and

2
00:00:10,190 --> 00:00:13,300
what we see is we have this
h transpose h inverse and

3
00:00:13,300 --> 00:00:15,870
let's talk about that a little bit more.

4
00:00:15,870 --> 00:00:20,740
Remember h was that big green matrix,
it's the matrix of all the features for

5
00:00:20,740 --> 00:00:23,110
each one of our observations.

6
00:00:23,110 --> 00:00:27,340
So each row is a different observation and
we have that matrix.

7
00:00:27,340 --> 00:00:31,060
And we're pre multiplying by the transpose
where we take it and set it on its side.

8
00:00:32,080 --> 00:00:36,520
So, this inner part here is this green
matrix on its side times the regular green

9
00:00:36,520 --> 00:00:41,640
matrix and
what's the result of that multiplication?

10
00:00:41,640 --> 00:00:47,860
Well, remember how many rows
are there to this matrix?

11
00:00:48,900 --> 00:00:53,980
Well there are however many
observations we have in our dataset,

12
00:00:53,980 --> 00:00:58,010
which is N,
that's how many rows there are.

13
00:00:58,010 --> 00:00:59,320
And how many columns?

14
00:01:00,950 --> 00:01:03,660
Well, it's however many
features we're using.

15
00:01:04,860 --> 00:01:06,190
And what's our notation for that?

16
00:01:06,190 --> 00:01:07,410
That's just capital D.

17
00:01:08,910 --> 00:01:13,875
Okay.
So, if we multiply these two matrices so

18
00:01:13,875 --> 00:01:21,110
in contrast when I take the transpose
I have N columns and D rows.

19
00:01:21,110 --> 00:01:26,059
And the result of multiplying
a D by N matrix by an N by

20
00:01:26,059 --> 00:01:29,063
D matrix is just a D by D matrix.

21
00:01:32,159 --> 00:01:39,325
So, it's a square matrix
that's D rows by D columns.

22
00:01:39,325 --> 00:01:42,070
So, let me be a little bit more explicit.

23
00:01:42,070 --> 00:01:47,492
It's number of features
by number of features.

24
00:01:50,999 --> 00:01:53,270
And then we need to take
the inverse of this matrix.

25
00:01:54,370 --> 00:01:59,096
So, that's gonna be invertible, this
resulting matrix is gonna be invertible.

26
00:01:59,096 --> 00:02:05,785
In general, so I'll say in most cases.

27
00:02:08,362 --> 00:02:14,940
If the number of observations we have
is larger than the number of features.

28
00:02:16,670 --> 00:02:20,130
Okay that means that the this
matrix is full rank and

29
00:02:20,130 --> 00:02:22,200
then we can take its inverse.

30
00:02:22,200 --> 00:02:26,778
If you don't know what full rank is
that's perfectly fine for this course.

31
00:02:26,778 --> 00:02:30,413
But if you do that's what
we're referring to here.

32
00:02:30,413 --> 00:02:35,960
And when I say in most cases is
because there's a little caveat where

33
00:02:35,960 --> 00:02:42,379
really it's just what we need is we need
to make sure it's not just the number of

34
00:02:42,379 --> 00:02:48,349
observations that we have that
are greater than the number of features.

35
00:02:49,350 --> 00:02:53,447
We need to make sure that
the number of linearly

36
00:02:53,447 --> 00:02:58,839
independent, Observations.

37
00:03:04,203 --> 00:03:08,990
So, I should say really instead of
capital N it's the number of linearly

38
00:03:08,990 --> 00:03:14,940
independent observations that needs to
be greater than the number of features.

39
00:03:14,940 --> 00:03:17,560
And, again if that didn't
make sense to you,

40
00:03:17,560 --> 00:03:22,190
that's actually fine just think about
the fact, and we'll talk about it a lot

41
00:03:22,190 --> 00:03:26,900
in this course in later modules that
this matrix might not be invertible.

42
00:03:28,750 --> 00:03:32,960
Okay, so what's the complexity
of the inverse though?

43
00:03:32,960 --> 00:03:36,440
Let's assume that we can
actually invert this matrix.

44
00:03:36,440 --> 00:03:40,907
Well the complexity is often
noted with this big O notation.

45
00:03:40,907 --> 00:03:46,046
So I'm writing a big O, just the letter
O number of features cubed, and

46
00:03:46,046 --> 00:03:51,016
what that means is that the number of
operations we have to do to invert

47
00:03:51,016 --> 00:03:56,180
this matrix scales cubically with
the number of features in our model.

48
00:03:57,650 --> 00:04:02,390
Okay so if you have lots and lots and
lots of features this can be really,

49
00:04:02,390 --> 00:04:05,420
really, really computationally
intensive to do.

50
00:04:05,420 --> 00:04:08,660
So, computationally intensive
that it might actually be

51
00:04:08,660 --> 00:04:12,720
computationally impossible to do.

52
00:04:12,720 --> 00:04:16,160
So, especially if we're looking
at applications with lots and

53
00:04:16,160 --> 00:04:20,062
lots of features, and
again assuming we have more observations

54
00:04:20,062 --> 00:04:22,670
still than these number of features,
we're gonna wanna

55
00:04:22,670 --> 00:04:27,370
use some other solution than forming
this big matrix and taking its inverse.

56
00:04:27,370 --> 00:04:33,063
Even though there are actually some really
fancy ways of doing this matrix inverse,

57
00:04:33,063 --> 00:04:36,590
and so know that those fancy ways exist,
but still,

58
00:04:36,590 --> 00:04:41,579
there are some very simple alternatives
to this closed-form solution.

59
00:04:42,914 --> 00:04:46,669
[MUSIC]