There is an important intuition behind LSH
and similar concepts called dimensionality
reduction.
Now think of the space or the collection
of objects for example you can think of
fingerprints.
It's D dimensional where you know D is
thousand because every print can be
represented by D dimensional vector which
has 1's and 0's.
1's where there are many [inaudible] 0's
where there are not.
1000 because we had a 25 by 40 grid.
There are lots and lots of possible
objects in this space, not all of which
are fingerprints, but definitely the space
is very, very large.
Two to the 1,000 is really, really huge.
What LSH does intuitively, is it reduces
the dimension to just b hash values, and b
is small, 1,024 in this case.
Further.
The fact that we use random hash
functions.
Turns out to make them locality sensitive.
Now, this is a very deep idea that random
choices and random functions actually work
really well, and we won't delve into why
that happens right now, but that's another
important intuition.
As a result, similar objects map to
similar hash values.
It also turns out that, LS aged techniques
are closely related to other kinds of
dimensionality reduction, methods, such as
for discovering topics in large document
collections objects in videos and images,
and many other.
A caveat though, implementing LS Age can
be a bit tricky, especially in parallel,
and something we're going to look into
during our discussion on big data
technology and map reduce.
Let's return now to our hidden agenda.
We are trying to understand something
about human memory from all the techniques
that we've been studying.
One question we should think about is do
we actually store all the objects, such as
images and experiences that we get at,
exposed to?
Do we have a large disk space somewhere
and I had where, things are just kept in
full fidelity, probably not.
Way back in the late'80s and early'90s,
Pentti Kanerva then working at NASA.
Came up with a concept called sparse
distributed memory.
Which tried to model how humans actually
store data.
In particular, high dimensional data like
images, experiences, and many of the
things that we are exposed to are high
dimensional.
So this stuff pre-dates LSH, is also
related to high dimensional spaces.
But rather than reducing the dimension, it
also exploits the fact that the space is
high dimensional.
Space distributed memory exploits some
rather peculiar features of high
dimensional spaces.
An example of a high dimensional space
that we've already seen is this space of
thousand bit vectors such as all possible
grid cells of 25 by 40 with 1's and 0's
that we use for fingerprints.
There are lots of these to a thousand
many, many hundred of 0's behind that.
Now, think about any two random
thousand-bit vectors.
What would be the average distance between
these in terms of the number of bits in
which they might defer?
What do you think?
Well, quiet obviously 500 because in on
the average, you would toss a coin to
decide whether the two vectors differed
from each other, and half the time they
wouldn't, half the time they wouldn't, so
in half the, the bits they would differ,
in half they would be the same.
Now consider, a vector X shows in at
random, could be anyone.
We start from this one vector x and ask
these few interesting questions.
Clearly, half all of the other vectors are
less than 500 bits away from this vector,
and half of them are more than 500 bits
away from this vector.
That is, again, obvious by a choice, by,
by tossing a coin.
The really interesting thing.
It's when we ask the question how many of
all the other vectors defer from our
chosen x?
While less than 450 bits.
Now remember, half of the vectors deferred
by less than 500 bits.
Well, how many differ by less than 450
bits?
So we toss a coin, and if the coin comes
up heads we, assume the other vector
differs, from x and otherwise it doesn't,
so we toss a 1000 coins and figure out,
whats the chance, that we will have, less
than 450 heads.
Actually, it's a quite, quite a small
chance that comes from probability theory.
We use the binomial distribution, with a
mean of 500 and N equal to 1000 because we
are tossing a 1000 fair coins.
The standard deviation of this is, square
root of 250.
You can look this up in any probability
text book.
We approximate the binomial distribution
by a normal distribution.
And then we see that.
Very interestingly, a very, very small
fraction of all the other vectors are less
than 450 bits from x.
So what that means is that with a very,
very high probability most of the other
vectors are.
Within a band of 450 and 550 bits away.
Remember the normal distribution is
symmetric, so what we did for, how many
other vectors differ from X by less than
450, can be repeated symmetrically for
asking the question, how many will differ
from X by more than 550.
So we'll, we have this very interesting
situation that, almost all vectors, in
this huge space.
Are far away from x.
Somewhere around 500 bits away.