1 00:00:00,000 --> 00:00:03,162 In this video, I want to talk about the normal equation 2 00:00:03,162 --> 00:00:05,212 and non-invertibility. 3 00:00:05,212 --> 00:00:07,877 This is a somewhat more advanced concept, 4 00:00:07,877 --> 00:00:10,289 but it is something that I've often been asked about. 5 00:00:10,289 --> 00:00:12,711 And so I wanted to talk about it here. 6 00:00:12,711 --> 00:00:14,752 But this is a somewhat more advanced concept, 7 00:00:14,752 --> 00:00:17,982 so feel free to consider this optional material 8 00:00:17,982 --> 00:00:22,413 There's a phenomenon that you may run into 9 00:00:22,413 --> 00:00:24,416 that's maybe for some of you useful to understand. 10 00:00:24,416 --> 00:00:26,619 But even if you don't understand it, 11 00:00:26,619 --> 00:00:28,450 the normal equation and linear regression, 12 00:00:28,450 --> 00:00:30,539 you should really get that to work okay. 13 00:00:30,539 --> 00:00:33,195 Here's the issue: 14 00:00:33,195 --> 00:00:35,691 For those of you that are maybe somewhat 15 00:00:35,691 --> 00:00:37,876 more familar with linear algebra, 16 00:00:37,876 --> 00:00:39,884 what some students have asked me is, 17 00:00:39,884 --> 00:00:42,542 when computing this 18 00:00:42,542 --> 00:00:45,130 theta equals ( Xtranspose X )inverse Xtranspose y 19 00:00:45,130 --> 00:00:49,476 what if the matrix Xtranspose X is non-invertible? 20 00:00:49,476 --> 00:00:52,336 So, for those of you that know a bit more linear algebra 21 00:00:52,336 --> 00:00:55,171 you may know that only some matrices 22 00:00:55,171 --> 00:00:58,598 are invertible and some matrices do not have an inverse 23 00:00:58,598 --> 00:01:00,540 we call those non-invertible matrices, 24 00:01:00,540 --> 00:01:04,737 singular or degenerate matrices. 25 00:01:04,737 --> 00:01:08,893 The issue or the problem of Xtranpose X being non-invertible 26 00:01:08,893 --> 00:01:11,287 should happen pretty rarely. 27 00:01:11,287 --> 00:01:16,749 And in Octave, if you implement this to compute theta, 28 00:01:16,749 --> 00:01:20,636 it turns out that this will actually do the right thing. 29 00:01:20,636 --> 00:01:24,629 I'm getting a little bit technical now and I don't want to go into details, 30 00:01:24,629 --> 00:01:28,207 but Octave has two functions for inverting matrices: 31 00:01:28,207 --> 00:01:32,146 One is called pinv(), and the other is called inv(). 32 00:01:32,146 --> 00:01:36,089 The differences between these two are somewhat technical. 33 00:01:36,089 --> 00:01:38,107 One's called the pseudo-inverse, one's called the inverse. 34 00:01:38,107 --> 00:01:42,658 You can show mathemically so as long as you use the pinv() function, 35 00:01:42,658 --> 00:01:47,145 then this will actually compute the value of theta that you want, 36 00:01:47,145 --> 00:01:51,227 even if Xtranspose X is non-invertible. 37 00:01:51,227 --> 00:01:54,095 The specific details between what is the difference between 38 00:01:54,095 --> 00:01:55,959 pinv() and what is inv() 39 00:01:55,959 --> 00:01:58,562 that is somewhat advanced numerical computing concepts, 40 00:01:58,562 --> 00:02:00,907 that I don't really want to get into. 41 00:02:00,907 --> 00:02:02,993 But I thought in this optional 42 00:02:02,993 --> 00:02:04,672 video I try to give you a little bit of intuition 43 00:02:04,672 --> 00:02:08,823 about what it means that Xtranspose X to be non-invertible. 44 00:02:08,823 --> 00:02:12,108 For those of you that know a bit more linear algebra 45 00:02:12,108 --> 00:02:13,556 and might be interested. 46 00:02:13,556 --> 00:02:15,948 I'm not going to proove this mathematically, 47 00:02:15,948 --> 00:02:18,684 but if Xtranspose X is non-invertible, 48 00:02:18,684 --> 00:02:22,596 there are usually two most common causes: 49 00:02:22,596 --> 00:02:26,238 The first cause is if somehow, in your learning problem, 50 00:02:26,238 --> 00:02:28,461 you have redundant features, 51 00:02:28,461 --> 00:02:30,844 concretely, if you try to predict housing prices 52 00:02:30,844 --> 00:02:34,877 and if x1 is the size of a house in square-feet, 53 00:02:34,877 --> 00:02:37,792 and x2 is the size of the house in square-meters, 54 00:02:37,792 --> 00:02:46,071 then, you know, 1 meter is equal to 3.28 feet, rounded to two decimals, 55 00:02:46,071 --> 00:02:48,947 and so your two features will always satisfy the constraint 56 00:02:48,947 --> 00:02:55,378 that x1 equals 3(.28)^2 times x2. 57 00:02:55,378 --> 00:02:59,107 And you can show, for those of you - this is somehwat advanced linear algebra now, 58 00:02:59,107 --> 00:03:01,169 but if you're an expert in linear algebra, 59 00:03:01,169 --> 00:03:05,275 you can actually show that if your two features are related via a linear equation like this, 60 00:03:05,275 --> 00:03:09,095 then matrix Xtranspose X will be non-invertible. 61 00:03:09,095 --> 00:03:13,320 The second thing that can cause Xtranspose X to be non-invertible 62 00:03:13,320 --> 00:03:17,043 is if you're trying to run a learning algorithm 63 00:03:17,043 --> 00:03:18,850 with a lot of a features. 64 00:03:18,850 --> 00:03:23,035 Concretely, if m is less than or equal to n. 65 00:03:23,035 --> 00:03:27,723 For example, if you imagine that you have m equals 10 training examples 66 00:03:27,723 --> 00:03:31,192 and that you have n equals 100 features, then you're trying 67 00:03:31,192 --> 00:03:36,829 to fit a parameter vector theta, which is (n+1)-dimensional, 68 00:03:36,829 --> 00:03:39,308 so it's a 101-dimensional 69 00:03:39,308 --> 00:03:43,602 you're trying to fit a 101 parameters from just 10 training examples. 70 00:03:43,602 --> 00:03:46,899 And this turns out to sometimes work, 71 00:03:46,899 --> 00:03:49,078 but to not always be a good idea. 72 00:03:49,078 --> 00:03:52,212 Because, as we see later, you might not have enough data 73 00:03:52,212 --> 00:03:58,432 if you only have 10 examples to fit 100 or 101 parameters. 74 00:03:58,432 --> 00:04:01,924 We'll see later in this course, why this might be too little data 75 00:04:01,924 --> 00:04:04,418 to fit this many parameters. 76 00:04:04,418 --> 00:04:07,544 But commonly, what we do then if m is less than n, 77 00:04:07,544 --> 00:04:12,513 is to see if we can either delete some features or to use a technique 78 00:04:12,513 --> 00:04:14,689 called regularization, 79 00:04:14,689 --> 00:04:17,477 which is something that we will talk about a bit later in this course as well, 80 00:04:17,477 --> 00:04:21,905 that will kind of let you fit a lot of parameters using a lot of features 81 00:04:21,905 --> 00:04:24,117 even if you have a relatively small training set. 82 00:04:24,117 --> 00:04:27,698 But this regularization will be a later topic in this course. 83 00:04:27,698 --> 00:04:32,628 But to summarize, if ever you find that Xtranspose X is singular 84 00:04:32,628 --> 00:04:35,877 or alternatively find is non-invertible, 85 00:04:35,877 --> 00:04:38,380 what I would recommend you do is 86 00:04:38,380 --> 00:04:42,016 first: look at your features and see if you have redundant features 87 00:04:42,016 --> 00:04:45,304 like these x1 and x2 being linearly dependent, 88 00:04:45,304 --> 00:04:48,017 or being a linear function of each other, like so 89 00:04:48,017 --> 00:04:49,841 and if you do have redundant features and 90 00:04:49,841 --> 00:04:51,493 if you just delete one of these features - 91 00:04:51,493 --> 00:04:53,724 you really don't need both of these features, 92 00:04:53,724 --> 00:04:55,601 so if you just delete one of these features 93 00:04:55,601 --> 00:04:58,586 that will solve your non-invertibility problem 94 00:04:58,586 --> 00:05:02,655 and, so first think through my features and check if any are redundant 95 00:05:02,655 --> 00:05:05,481 and if so, then, you know, keep deleting the redundant features 96 00:05:05,481 --> 00:05:07,659 until they are no longer redundant. 97 00:05:07,659 --> 00:05:09,799 And if your features are non redundant, 98 00:05:09,799 --> 00:05:11,939 I would check if I might have too many features, 99 00:05:11,939 --> 00:05:13,638 and if that's the case I would either 100 00:05:13,638 --> 00:05:16,140 delete some features if I can bare to use fewer features, 101 00:05:16,140 --> 00:05:20,708 or else I would consider using regularization, 102 00:05:20,708 --> 00:05:22,821 which is this topic that we will talk about later. 103 00:05:22,821 --> 00:05:27,877 So, that's it for the normal equation and what it means 104 00:05:27,877 --> 00:05:31,885 if the matrix Xtranspose X is non-invertible. 105 00:05:31,885 --> 00:05:35,710 But this is a problem that hopefully you run into pretty rarely. 106 00:05:35,710 --> 00:05:40,554 And if you just implement it in Octave using the pinv() function 107 00:05:40,554 --> 00:05:42,853 which is called the pseudo-inverse function 108 00:05:42,853 --> 00:05:46,700 so you use a different linear algebra library, that is called pseudo-inverse 109 00:05:46,700 --> 00:05:50,071 but that implementation should just do the right thing 110 00:05:50,071 --> 00:05:52,582 even if Xtranspose X is non-invertible 111 00:05:52,582 --> 00:05:55,198 which should happen pretty rarily anyway 112 00:05:55,198 --> 99:59:59,000 so this should not be a problem for most implementations of linear regression.