[MUSIC] So, we've talked about the performance
of nearest neighbor regression, as you get tons and tons of observations. And things look pretty good in that case. Actually, amazingly good, but
the picture isn't quite as nice when we start increasing
the dimension of our input space or if we just have a limited
set of observations. Because the performance of nearest
neighbors regression, as we talked about before, is very sensitive to how well
the data covers the input space. If you remember we had this plot early on
where we just removed observations from some point of our input space and we got
a pretty bad, When you're a [INAUDIBLE]. Well, now imagine you have
a really a huge, really, really high dimensional space
that you are looking at. And, you need your observations to
cover all points within this space. Well, you're gonna need an exponentially
large number of observations in the dimension of the space you're looking
at in order to cover this space and have good nearest neighbor performance. And that's typically really hard to do, to have as many observations as you would
need in these high-dimensional spaces. So, the cases where you have limited
data relative to the dimension of the input space you're looking at Is
where these parametric models that we've talked about throughout the rest of
this course become so important. And another thing that's important to
talk about is the complexity of our nearest neighbor search and the naive type of algorithm that we've
talked about so far which is just our search through our entire data set can
be quite computationally intensive. So, for example,
if you have just one query point, xq, and you're just doing one
nearest neighbor search, you have to scan through every single
observation in your data set and compute the distance to each
one of those observations. So, going from our query house to
each one of the different houses in our data set computing this
similarity between these houses in order to find the nearest neighbor. So, that has complexity, that's linear in
the number of observations that we have. Or if we wanna do our k
nearest neighbor as approach, we have to maintain a list of
the k nearest neighbors and if you do this in this sorted
queue that we've talked about. Then, you can do this search in O(Nlogk). But both of these, either for
our 1-NN or our k-NN, have complexity linear and
the number of observations we have. But what if N is huge? Because this is the situation where we've
said these k-NN approaches work so well. Well, that can be really,
really intensive to do, and especially if you have to do many queries. So, if you wanna predict the values
at many points in your input space, this can be a really,
really intensive procedure. So, instead, we're gonna talk
about more efficient methods for doing this type of nearest neighbor
search in our Clustering and Retrieval course later
in this specialization. [MUSIC]