[MUSIC] To wrap up this module, we're going to talk about a way to perform even more efficient nearest neighbor search using localitive sense of hashing using something that are called multiple tables. So these are multiple hash tables. And this is going to be advanced material. I actually think you could watch the video and get out the high level concept of what's going on and the importance of it. But it's going to include some content that's quite advanced in terms of looking at probabilities of different events and assessing the relative probabilities of different setups. And so we're putting this as an optional video, but I would encourage you to at least look at it and think through these different ideas even if you don't follow all the details mathematically of what's going on here. Okay, so let's look at a specific example of locality sense of hashing, where I'm just going to throw down two lines. And for simplicity I'm going to assume an algorithm where we search the bin that the query point falls into, as well as just one bit off. So in the case of throwing down two lines, we would search, for example, let's say that this is our query point, here. And it has been index 0,0 then we would also search bins 0,1 and 1,0. So these are cases where we've taken, switching colors again. We've taken one bit and flipped it. And searched those two neighboring bins. And the other thing we're going to assume is that, we're just going to call delta the probability of putting a line between two points that are theta angles apart. So if had two points. Living in this space and they were theta radians apart, then we're going to say that the probability that a randomly placed line divides these points. So this happens With probability delta. And so we can think through what delta is. So if this line is just uniformly placed between, let's say, zero and pi. So once we get to the other side we start wrapping around if you think about visualizing it. But you can say that the probability of falling within some little region theta is just theta divided by pi. So that would be one example of this probability delta but let's just generically say that we have probability delta falling in this range. Okay, then what we're saying here is that in this case we're going to search three bins. We're going to search our query bin and the two neighboring bins where we flipped just one bit. And the chance that we're not going to find a nearest neighbor, within these three bins is, delta squared. Because it's the chance that two lines, not just one line, divide our query point from its nearest neighbor. And, what's the chance of getting two lines within a region theta? Well, these two lines were drawn independently and each had probability delta of falling between in that range, that theta range, so we have delta times delta which is delta squared. But what would happen if I just repeated this process multiple times of throwing down two lines at random? So here's the example that we showed before where we threw down our two lines. And here I'm using L0, L1, L2, L3 just generically to refer to the list of data points that fall into this Bin 0, 1, 2, and 3. Okay so now let's say we throw down two other random lines. So it's these two lines and that creates a different partitioning of our data. And then we throw down two more lines and we get a different partitioning. And so what we've done is we've ended up with three different hash tables if we repeat this process three times. And now let's make an assumption that we're only going to search the bin in which our query point falls, so in particular, here's our query point and for this first case, we know that it has bin index 0,0 so, we're going to search this bin 0. And in the second case, this is our same query point here, has bin index 0,1, so we're going to search the 0,1 bin and then, likewise, in this case, it also has bin index 0,1, so again we're going to search the 0,1 bin. So, I've constructed this example. I've created three different hash tables and said we're only searching the bin that the query point falls in, so that in this case we're still searching three bins. Just like we did before when we searched the query point and then the two neighboring bins. So we're still searching three bins. But the question is now, what's the probability that we're not going to find our nearest neighbor. So let's work through this. So what this statement is equivalent to is, what's the chance that the query point and the nearest neighbor are in different bins, that they get split in all of these different hash tables? So what's the probability that a line falls between the query point and its nearest neighbor in this first example, the second example and this third example. Well, we can work through this mathematically or probabilistically. So in particular the probability of this event if we look, let's just look at this for a setup here to begin with, and the probability that the nearest neighbor's in a different bin than the query point in this first example is one minus the probability that they fell into the same bin. And what's the probability that the query point and the nearest neighbor fall into the same bin? Well, it's the probability that neither the first line nor the second line separated these two points. So what's the probability that neither line or sorry, what's the probability that one line does not split the two points? We know that delta is the probability that splits it. So one minus delta is the probability that there's no split from one line. But we have two lines. So, I mean just write this down, so we have. One minus delta equals probability, that one line. Does not split. Query and nearest neighbor. And then the reason we get this squared is because these lines are thrown down independently, so one minus delta squared is probability that two lines don't split these points. So what we have here is we have one minus the probability that neither line split our query and its nearest neighbor. And if you rewrite this here, you can rewrite it as 2 delta- delta squared, okay. So now that was just the probability of having our query point, and the nearest neighbor in different bins for this one example here. Well, what's the probability that that happens in all three cases? Well, again all three cases are completely independent events and, so we can write the probability as just raised to the power of the number of hash tables that we have. So we have this 2 delta- delta squared probability of splitting in any one of these cases, and we have three cases that we're going to look at. So the probability of splitting in each of them is that probability we looked at before raised to the third power. Okay, so to recap, if we looked at three bins in one hash table, then the probability of not finding our nearest neighbor after completing the search of those three bins was delta squared. But if we defined three hash tables, and only in each one searched one bin, so still a total of three bins searched overall. The probability of not finding the nearest neighbor is 2 delta- delta squared cubed. Okay, well it's a little bit hard to compare these two functions just written as they are. So I made a plot of them for you. And what, the X-axis is, is delta. It's the probability of a split of points that are an angle theta apart. And the Y-axis is the probability of not finding the nearest neighbor after performing the search of three bins that I described. So again, let me just be clear that this is under the scenario of throwing down two lines, searching three bins just to make sure the setup is clear. Okay, and what we see is that here, this pink case, let me switch colors, maybe there's a pink color. No, nothing very pink will just go with red. So this case here is the multiple hash table scenario and this blue line Is the single hash table scenario. And what we see is that we have a lower probability of not finding our nearest neighbor when we use multiple hash tables, still searching the same number of bins in almost all of the cases. There is this crossover point, somewhere around here, where one method, a single hash table, starts outperforming the multiple hash table, but that's when the angle between our nearest neighbor, the query and its nearest neighbor becomes larger. Well, we're going to assume in a lot of cases that the distance between our query and its nearest neighbor is small, and so in these cases we definitely get a benefit from looking at multiple hash tables. But the benefit is even larger if I think about throwing down more lines than just two. So let's restrict exactly what the algorithm's going to be doing here. So in this case we're going to think about throwing down some H lines, and then we're going to search, as before, the bin that the query point falls into, as well as all bins that are one bit off. But remember when we have more lines we have more bits, and so flipping just one bit there are going to be more bins that we end up searching. So, in particular, even when we're still searching bins that are just one bit off from the query, we're going to have to look at the probability of our nearest neighbor being more than one bit away. Okay, so, this is equal to 1 minus the probability that the points fall into this same bin, because then we would of found them, minus the probability that they're more than this 1 bin away search that we're doing. Okay, and so this is equivalent to saying it's 1 minus the probability that there were no splits. So no line split are querying its nearest neighbor minus the probability that there was one line, exactly one line, that split the two points. And so here what we have, so probability of no split is 1- delta, and we have h lines that we're throwing down. So this implies the probability that none of h lines split is (1- delta) to the h. Okay, now, let's talk about this other one, which is a little bit more complicated. because we need to compute what's the probability that a single line, only a single line, splits our query and its nearest neighbor. Not two lines or three lines or four lines or each lines, just one. So that's the probability. So here delta is the probability of one line splitting. And then here, We have the probability of h-1 lines, Not splitting the points. But there are h different ways we can have just one line and not the other lines. So that's why we have to multiply by the h here. So in summary, we're going to search h + 1 bins, and not find the nearest neighbor, with probability 1- (1- delta) to the h- h delta(1 -delta) to the h- 1. Okay, well now let's consider the alternative using multiple hash tables. So again, just to keep the number of comparisons the same, remember that if we throw down h different lines and we search the query bin, and all the bins that are one bit off, we're going to end up searching h + 1 bins. Well now, when we're thinking about multiple hash tables again to make it as much apples and apples type of comparison that we can do, we're going to think about defining h + 1 different hash tables. And then just searching the query bin in each of these different tables. Okay and so then what we're going to calculate is what's the probability that my nearest neighbor is in a different bin in all h+1 of these hash tables. Well, it's 1 minus the probability that the query point and the nearest neighbor are in the same bin for a given hash table but then we have that to the h+1 because we're looking at h+1 different hash tables. So it's 1 minus the probability that there's no split line and repeating that h+1 times and so that's 1- delta to the h, to the h+1. Okay just to be very clear this is probability of no split in any I'm sorry, split from any of the h different lines that we threw down. So 1 minus that is the probability that they fall into separate bins rather than the same bin. And we need this to hold for all hash tables. I want to say this one other way in case the way I was presenting that wasn't clear. So in particular, what we're computing here per hash table is what's the probability that 1 or 2 or 3 or all the way up to h different lines split are query point from its nearest neighbor. And one simple way of calculating that is just 1 minus the complement of that event which is the probability that there are no lines, none of the h lines that we throw down split these two points. Okay, and so that's what we have to calculate per hash table and we just multiply those together to get the overall probability that this doesn't happen, or rather, that there is a split of points in each one of our different tables. Okay, so now let's summarize in this slightly more generic setting. Where we're going to assume that we're throwing down h lines. And in the first case we're going to search h+1 bins and the probability of not finding a nearest neighbor is written here. I'm not going to repeat that equation. And in the case of boltable hash tables in particular are looking at h+1 different hash tables, but always searching just the bin where the query point falls in where you have this probability of not finding the nearest neighbor. Again, not very straightforward to think about comparing these two different equations. So let's look at some plots. And we're going to look at plots in two different scenarios. One is for throwing down three lines, the other one is for throwing down ten lines. And, what we see is here, this top case is the one hash table scenario. And this future line is that multiple hash table scenario. And what we see in these plots is that now there's a much larger difference between the benefits that we have using multiple hash tables over a single hash table. And that's especially true as we increase the number of lines that we are throwing down. And I want to mention that the set up that I created here was really for the sake of comparing looking at one hash table versus multiple hash tables but in general the way people really use this in practice is you don't have to create exactly as many hash tables as lines thrown down. You can actually create many, many more hash tables than the number of lines you throw down. So in particular, you can think of keeping the number of bits fixed, the number of lines you throw down fixed, and think about increasing the number of hash tables. So we're going to call m, the number of hash tables. And h, here is the number of lines or number of bits. And so people sometimes refer to this as the width and the depth of a table. So the width being the number of bits. The depth being the number of hash tables that we're looking at. And what you can show is that the probability of not finding the nearest neighbor. So, the probability that the query point in the nearest neighbor in different bins in all of these m different hash tables falls off exponentially fast. So in particular, the probability calculation is, exactly what we went through before but instead of having h+1 different hash tables, here we're looking at m+1. Or I guess, sorry. This really should just be n, not n+1, because we're not looking at that specific contrived example. But the point still holds that, as you increase the number of hash tables because you're raising some probability, so some number less than 1, to higher and higher powers, you're getting lower and lower probability at this exponentially fast rate of not finding our nearest neighbor. And typically in many different setups we're going to consider, you can show that the probability of finding a nearest neighbor after searching one bin in m different hash tables, so again in total m different bins, is higher than if we were to search m bins in a single hash table. Okay, so to summarize what we've shown in this optional video, is that when we think about searching multiple bins within a single hash table, the cost of binning the points is lower. You only have to form one hash table. But typically you tend to have to search more bins to have the same quality of approximation as if you performed multiple binnings of points. So, look at multiple hash tables which cost more. You have to create all of these different hash tables. But tends to provide a better approximation for our nearest neighbor search. [MUSIC]