1 00:00:04,770 --> 00:00:10,190 Okay, so let's think a little bit about this form of this closed form solution and 2 00:00:10,190 --> 00:00:13,300 what we see is we have this h transpose h inverse and 3 00:00:13,300 --> 00:00:15,870 let's talk about that a little bit more. 4 00:00:15,870 --> 00:00:20,740 Remember h was that big green matrix, it's the matrix of all the features for 5 00:00:20,740 --> 00:00:23,110 each one of our observations. 6 00:00:23,110 --> 00:00:27,340 So each row is a different observation and we have that matrix. 7 00:00:27,340 --> 00:00:31,060 And we're pre multiplying by the transpose where we take it and set it on its side. 8 00:00:32,080 --> 00:00:36,520 So, this inner part here is this green matrix on its side times the regular green 9 00:00:36,520 --> 00:00:41,640 matrix and what's the result of that multiplication? 10 00:00:41,640 --> 00:00:47,860 Well, remember how many rows are there to this matrix? 11 00:00:48,900 --> 00:00:53,980 Well there are however many observations we have in our dataset, 12 00:00:53,980 --> 00:00:58,010 which is N, that's how many rows there are. 13 00:00:58,010 --> 00:00:59,320 And how many columns? 14 00:01:00,950 --> 00:01:03,660 Well, it's however many features we're using. 15 00:01:04,860 --> 00:01:06,190 And what's our notation for that? 16 00:01:06,190 --> 00:01:07,410 That's just capital D. 17 00:01:08,910 --> 00:01:13,875 Okay. So, if we multiply these two matrices so 18 00:01:13,875 --> 00:01:21,110 in contrast when I take the transpose I have N columns and D rows. 19 00:01:21,110 --> 00:01:26,059 And the result of multiplying a D by N matrix by an N by 20 00:01:26,059 --> 00:01:29,063 D matrix is just a D by D matrix. 21 00:01:32,159 --> 00:01:39,325 So, it's a square matrix that's D rows by D columns. 22 00:01:39,325 --> 00:01:42,070 So, let me be a little bit more explicit. 23 00:01:42,070 --> 00:01:47,492 It's number of features by number of features. 24 00:01:50,999 --> 00:01:53,270 And then we need to take the inverse of this matrix. 25 00:01:54,370 --> 00:01:59,096 So, that's gonna be invertible, this resulting matrix is gonna be invertible. 26 00:01:59,096 --> 00:02:05,785 In general, so I'll say in most cases. 27 00:02:08,362 --> 00:02:14,940 If the number of observations we have is larger than the number of features. 28 00:02:16,670 --> 00:02:20,130 Okay that means that the this matrix is full rank and 29 00:02:20,130 --> 00:02:22,200 then we can take its inverse. 30 00:02:22,200 --> 00:02:26,778 If you don't know what full rank is that's perfectly fine for this course. 31 00:02:26,778 --> 00:02:30,413 But if you do that's what we're referring to here. 32 00:02:30,413 --> 00:02:35,960 And when I say in most cases is because there's a little caveat where 33 00:02:35,960 --> 00:02:42,379 really it's just what we need is we need to make sure it's not just the number of 34 00:02:42,379 --> 00:02:48,349 observations that we have that are greater than the number of features. 35 00:02:49,350 --> 00:02:53,447 We need to make sure that the number of linearly 36 00:02:53,447 --> 00:02:58,839 independent, Observations. 37 00:03:04,203 --> 00:03:08,990 So, I should say really instead of capital N it's the number of linearly 38 00:03:08,990 --> 00:03:14,940 independent observations that needs to be greater than the number of features. 39 00:03:14,940 --> 00:03:17,560 And, again if that didn't make sense to you, 40 00:03:17,560 --> 00:03:22,190 that's actually fine just think about the fact, and we'll talk about it a lot 41 00:03:22,190 --> 00:03:26,900 in this course in later modules that this matrix might not be invertible. 42 00:03:28,750 --> 00:03:32,960 Okay, so what's the complexity of the inverse though? 43 00:03:32,960 --> 00:03:36,440 Let's assume that we can actually invert this matrix. 44 00:03:36,440 --> 00:03:40,907 Well the complexity is often noted with this big O notation. 45 00:03:40,907 --> 00:03:46,046 So I'm writing a big O, just the letter O number of features cubed, and 46 00:03:46,046 --> 00:03:51,016 what that means is that the number of operations we have to do to invert 47 00:03:51,016 --> 00:03:56,180 this matrix scales cubically with the number of features in our model. 48 00:03:57,650 --> 00:04:02,390 Okay so if you have lots and lots and lots of features this can be really, 49 00:04:02,390 --> 00:04:05,420 really, really computationally intensive to do. 50 00:04:05,420 --> 00:04:08,660 So, computationally intensive that it might actually be 51 00:04:08,660 --> 00:04:12,720 computationally impossible to do. 52 00:04:12,720 --> 00:04:16,160 So, especially if we're looking at applications with lots and 53 00:04:16,160 --> 00:04:20,062 lots of features, and again assuming we have more observations 54 00:04:20,062 --> 00:04:22,670 still than these number of features, we're gonna wanna 55 00:04:22,670 --> 00:04:27,370 use some other solution than forming this big matrix and taking its inverse. 56 00:04:27,370 --> 00:04:33,063 Even though there are actually some really fancy ways of doing this matrix inverse, 57 00:04:33,063 --> 00:04:36,590 and so know that those fancy ways exist, but still, 58 00:04:36,590 --> 00:04:41,579 there are some very simple alternatives to this closed-form solution. 59 00:04:42,914 --> 00:04:46,669 [MUSIC]