1
00:00:00,000 --> 00:00:03,162
In this video, I want to talk about the normal equation
2
00:00:03,162 --> 00:00:05,212
and non-invertibility.
3
00:00:05,212 --> 00:00:07,877
This is a somewhat more advanced concept,
4
00:00:07,877 --> 00:00:10,289
but it is something that I've often been asked about.
5
00:00:10,289 --> 00:00:12,711
And so I wanted to talk about it here.
6
00:00:12,711 --> 00:00:14,752
But this is a somewhat more advanced concept,
7
00:00:14,752 --> 00:00:17,982
so feel free to consider this optional material
8
00:00:17,982 --> 00:00:22,413
There's a phenomenon that you may run into
9
00:00:22,413 --> 00:00:24,416
that's maybe for some of you useful to understand.
10
00:00:24,416 --> 00:00:26,619
But even if you don't understand it,
11
00:00:26,619 --> 00:00:28,450
the normal equation and linear regression,
12
00:00:28,450 --> 00:00:30,539
you should really get that to work okay.
13
00:00:30,539 --> 00:00:33,195
Here's the issue:
14
00:00:33,195 --> 00:00:35,691
For those of you that are maybe somewhat
15
00:00:35,691 --> 00:00:37,876
more familar with linear algebra,
16
00:00:37,876 --> 00:00:39,884
what some students have asked me is,
17
00:00:39,884 --> 00:00:42,542
when computing this
18
00:00:42,542 --> 00:00:45,130
theta equals ( Xtranspose X )inverse Xtranspose y
19
00:00:45,130 --> 00:00:49,476
what if the matrix Xtranspose X is non-invertible?
20
00:00:49,476 --> 00:00:52,336
So, for those of you that know a bit more linear algebra
21
00:00:52,336 --> 00:00:55,171
you may know that only some matrices
22
00:00:55,171 --> 00:00:58,598
are invertible and some matrices do not have an inverse
23
00:00:58,598 --> 00:01:00,540
we call those non-invertible matrices,
24
00:01:00,540 --> 00:01:04,737
singular or degenerate matrices.
25
00:01:04,737 --> 00:01:08,893
The issue or the problem of Xtranpose X being non-invertible
26
00:01:08,893 --> 00:01:11,287
should happen pretty rarely.
27
00:01:11,287 --> 00:01:16,749
And in Octave, if you implement this to compute theta,
28
00:01:16,749 --> 00:01:20,636
it turns out that this will actually do the right thing.
29
00:01:20,636 --> 00:01:24,629
I'm getting a little bit technical now and I don't want to go into details,
30
00:01:24,629 --> 00:01:28,207
but Octave has two functions for inverting matrices:
31
00:01:28,207 --> 00:01:32,146
One is called pinv(), and the other is called inv().
32
00:01:32,146 --> 00:01:36,089
The differences between these two are somewhat technical.
33
00:01:36,089 --> 00:01:38,107
One's called the pseudo-inverse, one's called the inverse.
34
00:01:38,107 --> 00:01:42,658
You can show mathemically so as long as you use the pinv() function,
35
00:01:42,658 --> 00:01:47,145
then this will actually compute the value of theta that you want,
36
00:01:47,145 --> 00:01:51,227
even if Xtranspose X is non-invertible.
37
00:01:51,227 --> 00:01:54,095
The specific details between what is the difference between
38
00:01:54,095 --> 00:01:55,959
pinv() and what is inv()
39
00:01:55,959 --> 00:01:58,562
that is somewhat advanced numerical computing concepts,
40
00:01:58,562 --> 00:02:00,907
that I don't really want to get into.
41
00:02:00,907 --> 00:02:02,993
But I thought in this optional
42
00:02:02,993 --> 00:02:04,672
video I try to give you a little bit of intuition
43
00:02:04,672 --> 00:02:08,823
about what it means that Xtranspose X to be non-invertible.
44
00:02:08,823 --> 00:02:12,108
For those of you that know a bit more linear algebra
45
00:02:12,108 --> 00:02:13,556
and might be interested.
46
00:02:13,556 --> 00:02:15,948
I'm not going to proove this mathematically,
47
00:02:15,948 --> 00:02:18,684
but if Xtranspose X is non-invertible,
48
00:02:18,684 --> 00:02:22,596
there are usually two most common causes:
49
00:02:22,596 --> 00:02:26,238
The first cause is if somehow, in your learning problem,
50
00:02:26,238 --> 00:02:28,461
you have redundant features,
51
00:02:28,461 --> 00:02:30,844
concretely, if you try to predict housing prices
52
00:02:30,844 --> 00:02:34,877
and if x1 is the size of a house in square-feet,
53
00:02:34,877 --> 00:02:37,792
and x2 is the size of the house in square-meters,
54
00:02:37,792 --> 00:02:46,071
then, you know, 1 meter is equal to 3.28 feet, rounded to two decimals,
55
00:02:46,071 --> 00:02:48,947
and so your two features will always satisfy the constraint
56
00:02:48,947 --> 00:02:55,378
that x1 equals 3(.28)^2 times x2.
57
00:02:55,378 --> 00:02:59,107
And you can show, for those of you - this is somehwat advanced linear algebra now,
58
00:02:59,107 --> 00:03:01,169
but if you're an expert in linear algebra,
59
00:03:01,169 --> 00:03:05,275
you can actually show that if your two features are related via a linear equation like this,
60
00:03:05,275 --> 00:03:09,095
then matrix Xtranspose X will be non-invertible.
61
00:03:09,095 --> 00:03:13,320
The second thing that can cause Xtranspose X to be non-invertible
62
00:03:13,320 --> 00:03:17,043
is if you're trying to run a learning algorithm
63
00:03:17,043 --> 00:03:18,850
with a lot of a features.
64
00:03:18,850 --> 00:03:23,035
Concretely, if m is less than or equal to n.
65
00:03:23,035 --> 00:03:27,723
For example, if you imagine that you have m equals 10 training examples
66
00:03:27,723 --> 00:03:31,192
and that you have n equals 100 features, then you're trying
67
00:03:31,192 --> 00:03:36,829
to fit a parameter vector theta, which is (n+1)-dimensional,
68
00:03:36,829 --> 00:03:39,308
so it's a 101-dimensional
69
00:03:39,308 --> 00:03:43,602
you're trying to fit a 101 parameters from just 10 training examples.
70
00:03:43,602 --> 00:03:46,899
And this turns out to sometimes work,
71
00:03:46,899 --> 00:03:49,078
but to not always be a good idea.
72
00:03:49,078 --> 00:03:52,212
Because, as we see later, you might not have enough data
73
00:03:52,212 --> 00:03:58,432
if you only have 10 examples to fit 100 or 101 parameters.
74
00:03:58,432 --> 00:04:01,924
We'll see later in this course, why this might be too little data
75
00:04:01,924 --> 00:04:04,418
to fit this many parameters.
76
00:04:04,418 --> 00:04:07,544
But commonly, what we do then if m is less than n,
77
00:04:07,544 --> 00:04:12,513
is to see if we can either delete some features or to use a technique
78
00:04:12,513 --> 00:04:14,689
called regularization,
79
00:04:14,689 --> 00:04:17,477
which is something that we will talk about a bit later in this course as well,
80
00:04:17,477 --> 00:04:21,905
that will kind of let you fit a lot of parameters using a lot of features
81
00:04:21,905 --> 00:04:24,117
even if you have a relatively small training set.
82
00:04:24,117 --> 00:04:27,698
But this regularization will be a later topic in this course.
83
00:04:27,698 --> 00:04:32,628
But to summarize, if ever you find that Xtranspose X is singular
84
00:04:32,628 --> 00:04:35,877
or alternatively find is non-invertible,
85
00:04:35,877 --> 00:04:38,380
what I would recommend you do is
86
00:04:38,380 --> 00:04:42,016
first: look at your features and see if you have redundant features
87
00:04:42,016 --> 00:04:45,304
like these x1 and x2 being linearly dependent,
88
00:04:45,304 --> 00:04:48,017
or being a linear function of each other, like so
89
00:04:48,017 --> 00:04:49,841
and if you do have redundant features and
90
00:04:49,841 --> 00:04:51,493
if you just delete one of these features -
91
00:04:51,493 --> 00:04:53,724
you really don't need both of these features,
92
00:04:53,724 --> 00:04:55,601
so if you just delete one of these features
93
00:04:55,601 --> 00:04:58,586
that will solve your non-invertibility problem
94
00:04:58,586 --> 00:05:02,655
and, so first think through my features and check if any are redundant
95
00:05:02,655 --> 00:05:05,481
and if so, then, you know, keep deleting the redundant features
96
00:05:05,481 --> 00:05:07,659
until they are no longer redundant.
97
00:05:07,659 --> 00:05:09,799
And if your features are non redundant,
98
00:05:09,799 --> 00:05:11,939
I would check if I might have too many features,
99
00:05:11,939 --> 00:05:13,638
and if that's the case I would either
100
00:05:13,638 --> 00:05:16,140
delete some features if I can bare to use fewer features,
101
00:05:16,140 --> 00:05:20,708
or else I would consider using regularization,
102
00:05:20,708 --> 00:05:22,821
which is this topic that we will talk about later.
103
00:05:22,821 --> 00:05:27,877
So, that's it for the normal equation and what it means
104
00:05:27,877 --> 00:05:31,885
if the matrix Xtranspose X is non-invertible.
105
00:05:31,885 --> 00:05:35,710
But this is a problem that hopefully you run into pretty rarely.
106
00:05:35,710 --> 00:05:40,554
And if you just implement it in Octave using the pinv() function
107
00:05:40,554 --> 00:05:42,853
which is called the pseudo-inverse function
108
00:05:42,853 --> 00:05:46,700
so you use a different linear algebra library, that is called pseudo-inverse
109
00:05:46,700 --> 00:05:50,071
but that implementation should just do the right thing
110
00:05:50,071 --> 00:05:52,582
even if Xtranspose X is non-invertible
111
00:05:52,582 --> 00:05:55,198
which should happen pretty rarily anyway
112
00:05:55,198 --> 99:59:59,000
so this should not be a problem for most implementations of linear regression.