[MUSIC] So all the illustrations we have given so far assume that we're working in just two dimensions, but in most, the examples that we're interested in, we might have a very large vocabulary or very high dimensional space that we're interested in partitioning and performing nearest neighbor search on. So let's talk about how we do things in these higher dimensions. Well if we have more than just two dimensions, let's say we have three dimensions, instead of throwing down a line, we're going to think about throwing down planes or hyperplanes. So this plane, again, we're going to assume that it goes through the zero, zero, zero point even though this illustration might not look exactly like it. But what we're going to say is that the score of any point x is simply given by v1 times, in this case, the number of times we see the word awesome, v2 times the number of times we see the word awful and v3 times the number of times we see the word great. So there are three different coefficients that define our plane in three dimensions, and if we had some d dimensions we would have d different coefficients that multiply each of our d different features. And that defines our score, and then we simply look at whether that score is positive or negative and use that to define the bin index. And just like in the two-dimensional case where we're thinking about throwing down these hyperplanes, we're just going to draw these coefficients, these v1, v2, v3 all the way up to vd completely at random. And that's going to define a random hyperplane that we're going to use for our partitioning. And so now I want to talk about what's the cost of binning our points in this d-dimensional space. Well remember that for each one for our data points in d-dimensions, we need to determine the bin index per plane that we're throwing down. So that means for each one of our, let's say we're looking at the ith hyperplane And it's going to have coefficients, let's say v1 (i), v2 (i), v3 (i), well we have to do d multiplications between these coefficients and our features to define the score just for that one hyperplane. And then we have to repeat that for each of the hyperplanes that we've thrown down. Okay, so there is some cost to doing this, but one thing that I want to mention is that in high dimensions and some applications, this is often a sparse multiplication that we're doing. Let me just write down some applications because this is not generically true, but we can think about this especially being true when we're thinking about the number of dimensions being the vocabulary and looking at documents and counts of how many times you see words in a vocabulary. Out of all possible words in a vocabulary, we often see only a few of those words in any given document. So even though generically, we're saying we're doing d different multiplies per hyperplane, in practice, we often would only need to compute a smaller subset of those multiplications. But the other thing that's really, really important and is generically true is that this one-time cost of forming this hash table is offset when we're doing many, many queries, okay? So, every time we query and look for a nearest neighbor, we're going to use this hash table that we formed. And we don't have to redo all these multiplications that we're talking about here. [MUSIC]