[MUSIC] So we mentioned that we're going to look at locality sensitive hashing or LSH as an alternative to KD-trees. Well, now let's dig into the details of what locality sensitive hashing does. And just like KD-trees, we're going to think of partitioning our data into bins, but the way in which locality sensitive hashing does this is just fundamentally different than the types of axis-aligned cuts made by KD-trees. And in this section we're going to return to an example that we used in the classification course, which is imagine we have reviews of restaurants. We have text from those reviews. And we're just going to count how many times we see the word awesome versus the word awful. And when we're thinking about retrieving similar types of reviews we're just going to look at these counts. Okay well just like in classification we can think of dividing these set of points just using a simple line. So this is like the decision boundary we saw in the classification course. And here I'm just writing the equation of this line and we're defining this term which we call the score. So it says for any input in terms of counts of the number of awesome and the number of awful, I'm going to score that review. And everything that falls below this line has a negative score and everything that's above this line has a positive score. But we're going to be using these scores in a different way than we did in the classification course. Here, if we're going to say that if the sign of the score is negative, then all of these points fall into what we're calling the 0 bin. But if the sign is positive, then all of those points are going to be assigned to bin 1. So we're going to translate the sign of the scores into a binary index. And then we can use this for nearest neighbor search in the following way. So if we get this green query point here then, we're only going to search for nearest neighbors in its bin, so all these other orange points. So, here we're only searching for nearest neighbors with the points whose score is less than zero. And in terms of this bin in this is that corresponds to all the points that where labeled as having bin index 0. Alternatively, if the query were to have fallen into this other region, this blue region, we would have looked over all the points that had bin index 1, because all of their scores were greater than 0. In practice, the way that were going to do this search over all data points that have the same bin index as the query point, is using something called a hashtable. And what a hashtable does is for each of our bin values, bin 0 or bin 1, we're going to call that the key and then we're going to associate with that key a set of values, a list of points, that are the indices of the data points, that have that bin value. Said another way, we're taking all of the datapoints that have been value 0 and we're going to list those data indices in the first element of this table and we're taking all the datapoints that have been value 1 and we're going to list those indices in the second element of this table. Then when we get a given query point and we look at the bin value of that query based on looking at the score, the sign of the score and determining the bin index, then we're just going to search along the list of points that are in that bin. So, this provides a really really simple way of doing nearest neighbor search. Very simple idea of throwing down a line and then just simply searching the other points that either fall above or below this line. But a question is does this return an exact nearest neighbor? And the answer is no. So for example, here we have this green query point, and let's say that this orange point which is just above the line and very close to that green point is actually the true nearest neighbor. Well, if we look at this green point and we're only searching all the blue points which have this same bin value as this green point, then we're not going to find that orange point. [MUSIC]