Okay, so let's think a little bit about this form of this closed form solution and what we see is we have this h transpose h inverse and let's talk about that a little bit more. Remember h was that big green matrix, it's the matrix of all the features for each one of our observations. So each row is a different observation and we have that matrix. And we're pre multiplying by the transpose where we take it and set it on its side. So, this inner part here is this green matrix on its side times the regular green matrix and what's the result of that multiplication? Well, remember how many rows are there to this matrix? Well there are however many observations we have in our dataset, which is N, that's how many rows there are. And how many columns? Well, it's however many features we're using. And what's our notation for that? That's just capital D. Okay. So, if we multiply these two matrices so in contrast when I take the transpose I have N columns and D rows. And the result of multiplying a D by N matrix by an N by D matrix is just a D by D matrix. So, it's a square matrix that's D rows by D columns. So, let me be a little bit more explicit. It's number of features by number of features. And then we need to take the inverse of this matrix. So, that's gonna be invertible, this resulting matrix is gonna be invertible. In general, so I'll say in most cases. If the number of observations we have is larger than the number of features. Okay that means that the this matrix is full rank and then we can take its inverse. If you don't know what full rank is that's perfectly fine for this course. But if you do that's what we're referring to here. And when I say in most cases is because there's a little caveat where really it's just what we need is we need to make sure it's not just the number of observations that we have that are greater than the number of features. We need to make sure that the number of linearly independent, Observations. So, I should say really instead of capital N it's the number of linearly independent observations that needs to be greater than the number of features. And, again if that didn't make sense to you, that's actually fine just think about the fact, and we'll talk about it a lot in this course in later modules that this matrix might not be invertible. Okay, so what's the complexity of the inverse though? Let's assume that we can actually invert this matrix. Well the complexity is often noted with this big O notation. So I'm writing a big O, just the letter O number of features cubed, and what that means is that the number of operations we have to do to invert this matrix scales cubically with the number of features in our model. Okay so if you have lots and lots and lots of features this can be really, really, really computationally intensive to do. So, computationally intensive that it might actually be computationally impossible to do. So, especially if we're looking at applications with lots and lots of features, and again assuming we have more observations still than these number of features, we're gonna wanna use some other solution than forming this big matrix and taking its inverse. Even though there are actually some really fancy ways of doing this matrix inverse, and so know that those fancy ways exist, but still, there are some very simple alternatives to this closed-form solution. [MUSIC]