[MUSIC] We've set up how to perform nearest
neighbor search in terms of describing the algorithm that we have to run through. And then spending some time
discussing two really critical components of that algorithm. The first being, how do we represent
our documents of interest? And the second being, how do we compute the distance
between two different documents? And we showed that those two components
are really, really important for the performance of the method. But now, I want to spend a little bit of
time thinking about how computationally intensive doing nearest
neighbor search can be. And we refer to this as
the complexity of search. Okay, so what's the complexity
of brute-force search? So let's say we have a given query point. What do we need to do? Well, we need to scan through
all the documents in the corpus, compute the distance to each one,
and then we have to say, okay, which is the smallest distance? So what's the complexity of doing that? Well, for a one nearest neighbor query,
we have to compute N different distances. So the complexity is order capital N,
number of documents in the corpus. And if we want to do k
nearest neighbors and return the k nearest documents,
then the complexity is order N log k. And the reason we get that log k term is
if we do a smart implementation where we maintain this priority queue and insert into the priority
queue in an efficient manner. Okay, but what if N is really,
really large? And importantly,
what if you have to do many, many queries on this corpus again and
again and again and again? Is there something we can do that's more
efficient than this brute-force search? [MUSIC]