[MUSIC] So all the illustrations we have given so
far assume that we're working in just two dimensions, but in most,
the examples that we're interested in, we might have a very large vocabulary or
very high dimensional space that we're interested in partitioning and
performing nearest neighbor search on. So let's talk about how we do
things in these higher dimensions. Well if we have more
than just two dimensions, let's say we have three dimensions,
instead of throwing down a line, we're going to think about throwing
down planes or hyperplanes. So this plane, again, we're going to assume that it
goes through the zero, zero, zero point even though this illustration
might not look exactly like it. But what we're going to say is that the
score of any point x is simply given by v1 times, in this case, the number
of times we see the word awesome, v2 times the number of times
we see the word awful and v3 times the number of times
we see the word great. So there are three different coefficients
that define our plane in three dimensions, and if we had some d dimensions we
would have d different coefficients that multiply each of our
d different features. And that defines our score, and then
we simply look at whether that score is positive or negative and
use that to define the bin index. And just like in the two-dimensional
case where we're thinking about throwing down these hyperplanes, we're
just going to draw these coefficients, these v1, v2, v3 all the way
up to vd completely at random. And that's going to define a random
hyperplane that we're going to use for our partitioning. And so now I want to talk about what's
the cost of binning our points in this d-dimensional space. Well remember that for each one for
our data points in d-dimensions, we need to determine the bin index
per plane that we're throwing down. So that means for each one of our, let's say we're looking
at the ith hyperplane And it's going to have coefficients, let's say v1 (i), v2 (i), v3 (i), well we have to do d multiplications
between these coefficients and our features to define the score just for
that one hyperplane. And then we have to repeat that for each
of the hyperplanes that we've thrown down. Okay, so there is some cost to doing this, but one thing that I
want to mention is that in high dimensions and some applications, this is often a sparse
multiplication that we're doing. Let me just write down some applications
because this is not generically true, but we can think about this especially being
true when we're thinking about the number of dimensions being the vocabulary and
looking at documents and counts of how many times you
see words in a vocabulary. Out of all possible words in a vocabulary, we often see only a few of those
words in any given document. So even though generically, we're saying we're doing d different
multiplies per hyperplane, in practice, we often would only need to compute
a smaller subset of those multiplications. But the other thing that's really,
really important and is generically true is that this
one-time cost of forming this hash table is offset when we're doing many,
many queries, okay? So, every time we query and
look for a nearest neighbor, we're going to use this
hash table that we formed. And we don't have to redo all these
multiplications that we're talking about here. [MUSIC]