[MUSIC] To wrap up this module, we're going to
talk about a way to perform even more efficient nearest neighbor search using
localitive sense of hashing using something that are called multiple tables. So these are multiple hash tables. And this is going to be advanced material. I actually think you
could watch the video and get out the high level concept of what's
going on and the importance of it. But it's going to include some content
that's quite advanced in terms of looking at probabilities
of different events and assessing the relative
probabilities of different setups. And so we're putting this
as an optional video, but I would encourage you to at least look at
it and think through these different ideas even if you don't follow all the details
mathematically of what's going on here. Okay, so let's look at a specific
example of locality sense of hashing, where I'm just going to
throw down two lines. And for simplicity I'm going to assume
an algorithm where we search the bin that the query point falls into,
as well as just one bit off. So in the case of throwing down two lines,
we would search, for example, let's say that this is our query point,
here. And it has been index 0,0 then we
would also search bins 0,1 and 1,0. So these are cases where we've taken,
switching colors again. We've taken one bit and flipped it. And searched those two neighboring bins. And the other thing we're going to assume
is that, we're just going to call delta the probability of putting a line between
two points that are theta angles apart. So if had two points. Living in this space and
they were theta radians apart, then we're going to say
that the probability that a randomly placed
line divides these points. So this happens With probability delta. And so we can think through what delta is. So if this line is just
uniformly placed between, let's say, zero and pi. So once we get to the other side we
start wrapping around if you think about visualizing it. But you can say that the probability
of falling within some little region theta is just theta divided by pi. So that would be one example
of this probability delta but let's just generically say that we have
probability delta falling in this range. Okay, then what we're saying
here is that in this case we're going to search three bins. We're going to search our query bin and the two neighboring bins where
we flipped just one bit. And the chance that we're not
going to find a nearest neighbor, within these three bins is, delta squared. Because it's the chance that two lines, not just one line, divide our query
point from its nearest neighbor. And, what's the chance of getting
two lines within a region theta? Well, these two lines were
drawn independently and each had probability delta of
falling between in that range, that theta range, so we have delta
times delta which is delta squared. But what would happen if I just repeated
this process multiple times of throwing down two lines at random? So here's the example that we showed
before where we threw down our two lines. And here I'm using L0, L1,
L2, L3 just generically to refer to the list of data points
that fall into this Bin 0, 1, 2, and 3. Okay so now let's say we throw
down two other random lines. So it's these two lines and that creates
a different partitioning of our data. And then we throw down two more lines and
we get a different partitioning. And so what we've done is we've ended
up with three different hash tables if we repeat this process three times. And now let's make an assumption that
we're only going to search the bin in which our query point falls,
so in particular, here's our query point and
for this first case, we know that it has bin index 0,0 so,
we're going to search this bin 0. And in the second case,
this is our same query point here, has bin index 0,1, so
we're going to search the 0,1 bin and then, likewise, in this case,
it also has bin index 0,1, so again we're going to
search the 0,1 bin. So, I've constructed this example. I've created three
different hash tables and said we're only searching the bin
that the query point falls in, so that in this case we're
still searching three bins. Just like we did before when we
searched the query point and then the two neighboring bins. So we're still searching three bins. But the question is now, what's the probability that we're not
going to find our nearest neighbor. So let's work through this. So what this statement is equivalent to
is, what's the chance that the query point and the nearest neighbor
are in different bins, that they get split in all of
these different hash tables? So what's the probability that a line
falls between the query point and its nearest neighbor in this first example,
the second example and this third example. Well, we can work through this
mathematically or probabilistically. So in particular the probability
of this event if we look, let's just look at this for a setup here
to begin with, and the probability that the nearest neighbor's in a different bin
than the query point in this first example is one minus the probability that
they fell into the same bin. And what's the probability
that the query point and the nearest neighbor
fall into the same bin? Well, it's the probability that
neither the first line nor the second line separated
these two points. So what's the probability
that neither line or sorry, what's the probability that one
line does not split the two points? We know that delta is
the probability that splits it. So one minus delta is the probability
that there's no split from one line. But we have two lines. So, I mean just write this down,
so we have. One minus delta equals probability,
that one line. Does not split. Query and nearest neighbor. And then the reason we get
this squared is because these lines are thrown down independently, so one minus delta squared is probability that two lines don't split these points. So what we have here is we have one
minus the probability that neither line split our query and
its nearest neighbor. And if you rewrite this here, you can rewrite it as 2
delta- delta squared, okay. So now that was just the probability
of having our query point, and the nearest neighbor in different
bins for this one example here. Well, what's the probability that
that happens in all three cases? Well, again all three cases
are completely independent events and, so we can write the probability
as just raised to the power of the number of hash tables that we have. So we have this 2 delta- delta squared
probability of splitting in any one of these cases, and we have three
cases that we're going to look at. So the probability of
splitting in each of them is that probability we looked at
before raised to the third power. Okay, so to recap, if we looked
at three bins in one hash table, then the probability of not
finding our nearest neighbor after completing the search of those
three bins was delta squared. But if we defined three hash tables,
and only in each one searched one bin, so still a total of three
bins searched overall. The probability of not finding the nearest
neighbor is 2 delta- delta squared cubed. Okay, well it's a little
bit hard to compare these two functions just
written as they are. So I made a plot of them for you. And what, the X-axis is, is delta. It's the probability of a split of
points that are an angle theta apart. And the Y-axis is the probability
of not finding the nearest neighbor after performing the search of
three bins that I described. So again, let me just be clear
that this is under the scenario of throwing down two lines, searching three bins just to
make sure the setup is clear. Okay, and what we see is that here,
this pink case, let me switch colors,
maybe there's a pink color. No, nothing very pink
will just go with red. So this case here is the multiple hash table scenario and this blue line Is the single hash table scenario. And what we see is that we have
a lower probability of not finding our nearest neighbor when we
use multiple hash tables, still searching the same number of
bins in almost all of the cases. There is this crossover point,
somewhere around here, where one method, a single hash table, starts outperforming the multiple
hash table, but that's when the angle between our nearest neighbor, the query
and its nearest neighbor becomes larger. Well, we're going to
assume in a lot of cases that the distance between our query and
its nearest neighbor is small, and so in these cases we definitely get a benefit
from looking at multiple hash tables. But the benefit is even larger if
I think about throwing down more lines than just two. So let's restrict exactly what
the algorithm's going to be doing here. So in this case we're going to think
about throwing down some H lines, and then we're going to search, as before, the bin that the query point falls into,
as well as all bins that are one bit off. But remember when we have more
lines we have more bits, and so flipping just one bit there are going to
be more bins that we end up searching. So, in particular, even when we're still
searching bins that are just one bit off from the query,
we're going to have to look at the probability of our nearest
neighbor being more than one bit away. Okay, so, this is equal to 1
minus the probability that the points fall into this same bin,
because then we would of found them, minus the probability that they're more than
this 1 bin away search that we're doing. Okay, and so this is equivalent to saying it's 1 minus
the probability that there were no splits. So no line split are querying
its nearest neighbor minus the probability
that there was one line, exactly one line,
that split the two points. And so here what we have, so probability of no split is 1- delta, and we have h lines that we're throwing down. So this implies the probability that none of h lines split is (1- delta) to the h. Okay, now,
let's talk about this other one, which is a little bit more complicated. because we need to compute what's
the probability that a single line, only a single line, splits our query and
its nearest neighbor. Not two lines or three lines or
four lines or each lines, just one. So that's the probability. So here delta is the probability
of one line splitting. And then here, We have the probability of h-1 lines, Not splitting the points. But there are h different ways we can have
just one line and not the other lines. So that's why we have to
multiply by the h here. So in summary, we're going to search h + 1
bins, and not find the nearest neighbor, with probability 1- (1- delta) to
the h- h delta(1 -delta) to the h- 1. Okay, well now let's consider the
alternative using multiple hash tables. So again, just to keep the number
of comparisons the same, remember that if we throw
down h different lines and we search the query bin, and
all the bins that are one bit off, we're going to end up
searching h + 1 bins. Well now, when we're thinking
about multiple hash tables again to make it as much apples and
apples type of comparison that we can do, we're going to think about defining
h + 1 different hash tables. And then just searching the query bin
in each of these different tables. Okay and so then what we're going to
calculate is what's the probability that my nearest neighbor is in a different
bin in all h+1 of these hash tables. Well, it's 1 minus the probability
that the query point and the nearest neighbor are in the same
bin for a given hash table but then we have that to the h+1 because we're
looking at h+1 different hash tables. So it's 1 minus
the probability that there's no split line and
repeating that h+1 times and so that's 1- delta to the h, to the h+1. Okay just to be very clear this is probability of no split in any I'm sorry, split from any of the h different lines that we threw down. So 1 minus that is
the probability that they fall into separate bins rather
than the same bin. And we need this to hold for
all hash tables. I want to say this one other way in case the way I was presenting
that wasn't clear. So in particular,
what we're computing here per hash table is what's the probability that 1 or
2 or 3 or all the way up to h different lines split are query
point from its nearest neighbor. And one simple way of calculating that is
just 1 minus the complement of that event which is the probability
that there are no lines, none of the h lines that we throw
down split these two points. Okay, and so that's what we have to
calculate per hash table and we just multiply those together to get the overall
probability that this doesn't happen, or rather, that there is a split of points
in each one of our different tables. Okay, so now let's summarize in
this slightly more generic setting. Where we're going to assume that
we're throwing down h lines. And in the first case we're
going to search h+1 bins and the probability of not finding
a nearest neighbor is written here. I'm not going to repeat that equation. And in the case of boltable hash tables in
particular are looking at h+1 different hash tables, but always searching just
the bin where the query point falls in where you have this probability of
not finding the nearest neighbor. Again, not very straightforward to
think about comparing these two different equations. So let's look at some plots. And we're going to look at plots
in two different scenarios. One is for throwing down three lines, the
other one is for throwing down ten lines. And, what we see is here, this top case is the one hash table scenario. And this future line is that multiple hash table scenario. And what we see in these plots is that now
there's a much larger difference between the benefits that we have using multiple
hash tables over a single hash table. And that's especially true as we
increase the number of lines that we are throwing down. And I want to mention that the set up
that I created here was really for the sake of comparing looking at one hash
table versus multiple hash tables but in general the way people really use
this in practice is you don't have to create exactly as many hash
tables as lines thrown down. You can actually create many, many more hash tables than
the number of lines you throw down. So in particular, you can think of
keeping the number of bits fixed, the number of lines you throw down fixed,
and think about increasing
the number of hash tables. So we're going to call m,
the number of hash tables. And h, here is the number of lines or number of bits. And so people sometimes refer to this
as the width and the depth of a table. So the width being the number of bits. The depth being the number of hash
tables that we're looking at. And what you can show
is that the probability of not finding the nearest neighbor. So, the probability that the query point
in the nearest neighbor in different bins in all of these m different hash
tables falls off exponentially fast. So in particular, the probability
calculation is, exactly what we went through before but instead of
having h+1 different hash tables, here we're looking at m+1. Or I guess, sorry. This really should just be n, not n+1, because we're not looking at
that specific contrived example. But the point still holds that,
as you increase the number of hash tables because you're raising some probability,
so some number less than 1, to higher and higher powers, you're getting lower and
lower probability at this exponentially fast rate of
not finding our nearest neighbor. And typically in many different
setups we're going to consider, you can show that the probability of
finding a nearest neighbor after searching one bin in m different hash tables,
so again in total m different bins, is higher than if we were to search
m bins in a single hash table. Okay, so to summarize what we've
shown in this optional video, is that when we think about searching
multiple bins within a single hash table, the cost of binning the points is lower. You only have to form one hash table. But typically you tend to have to search
more bins to have the same quality of approximation as if you performed
multiple binnings of points. So, look at multiple hash
tables which cost more. You have to create all of
these different hash tables. But tends to provide
a better approximation for our nearest neighbor search. [MUSIC]