1
00:00:00,000 --> 00:00:08,295
There is an important intuition behind LSH
and similar concepts called dimensionality

2
00:00:08,295 --> 00:00:14,642
reduction.
Now think of the space or the collection

3
00:00:14,642 --> 00:00:19,402
of objects for example you can think of
fingerprints.

4
00:00:19,402 --> 00:00:26,138
It's D dimensional where you know D is
thousand because every print can be

5
00:00:26,138 --> 00:00:31,347
represented by D dimensional vector which
has 1's and 0's.

6
00:00:31,347 --> 00:00:36,826
1's where there are many [inaudible] 0's
where there are not.

7
00:00:36,826 --> 00:00:44,371
1000 because we had a 25 by 40 grid.
There are lots and lots of possible

8
00:00:44,371 --> 00:00:50,157
objects in this space, not all of which
are fingerprints, but definitely the space

9
00:00:50,157 --> 00:00:54,300
is very, very large.
Two to the 1,000 is really, really huge.

10
00:00:56,280 --> 00:01:05,438
What LSH does intuitively, is it reduces
the dimension to just b hash values, and b

11
00:01:05,438 --> 00:01:09,460
is small, 1,024 in this case.
Further.

12
00:01:09,720 --> 00:01:13,700
The fact that we use random hash
functions.

13
00:01:14,500 --> 00:01:21,621
Turns out to make them locality sensitive.
Now, this is a very deep idea that random

14
00:01:21,621 --> 00:01:29,000
choices and random functions actually work
really well, and we won't delve into why

15
00:01:29,000 --> 00:01:34,320
that happens right now, but that's another
important intuition.

16
00:01:35,420 --> 00:01:39,600
As a result, similar objects map to
similar hash values.

17
00:01:40,480 --> 00:01:49,008
It also turns out that, LS aged techniques
are closely related to other kinds of

18
00:01:49,008 --> 00:01:57,750
dimensionality reduction, methods, such as
for discovering topics in large document

19
00:01:57,750 --> 00:02:03,720
collections objects in videos and images,
and many other.

20
00:02:04,880 --> 00:02:11,856
A caveat though, implementing LS Age can
be a bit tricky, especially in parallel,

21
00:02:11,856 --> 00:02:18,303
and something we're going to look into
during our discussion on big data

22
00:02:18,303 --> 00:02:24,156
technology and map reduce.
Let's return now to our hidden agenda.

23
00:02:24,156 --> 00:02:30,663
We are trying to understand something
about human memory from all the techniques

24
00:02:30,663 --> 00:02:37,924
that we've been studying.
One question we should think about is do

25
00:02:37,924 --> 00:02:45,349
we actually store all the objects, such as
images and experiences that we get at,

26
00:02:45,349 --> 00:02:50,082
exposed to?
Do we have a large disk space somewhere

27
00:02:50,082 --> 00:02:56,300
and I had where, things are just kept in
full fidelity, probably not.

28
00:02:58,720 --> 00:03:09,260
Way back in the late'80s and early'90s,
Pentti Kanerva then working at NASA.

29
00:03:09,965 --> 00:03:14,920
Came up with a concept called sparse
distributed memory.

30
00:03:15,440 --> 00:03:20,285
Which tried to model how humans actually
store data.

31
00:03:20,285 --> 00:03:27,647
In particular, high dimensional data like
images, experiences, and many of the

32
00:03:27,647 --> 00:03:32,400
things that we are exposed to are high
dimensional.

33
00:03:32,960 --> 00:03:39,680
So this stuff pre-dates LSH, is also
related to high dimensional spaces.

34
00:03:39,680 --> 00:03:47,537
But rather than reducing the dimension, it
also exploits the fact that the space is

35
00:03:47,537 --> 00:03:52,837
high dimensional.
Space distributed memory exploits some

36
00:03:52,837 --> 00:03:57,760
rather peculiar features of high
dimensional spaces.

37
00:03:59,040 --> 00:04:06,979
An example of a high dimensional space
that we've already seen is this space of

38
00:04:06,979 --> 00:04:15,118
thousand bit vectors such as all possible
grid cells of 25 by 40 with 1's and 0's

39
00:04:15,118 --> 00:04:21,767
that we use for fingerprints.
There are lots of these to a thousand

40
00:04:21,767 --> 00:04:32,708
many, many hundred of 0's behind that.
Now, think about any two random

41
00:04:32,708 --> 00:04:39,319
thousand-bit vectors.
What would be the average distance between

42
00:04:39,319 --> 00:04:44,095
these in terms of the number of bits in
which they might defer?

43
00:04:44,095 --> 00:04:50,772
What do you think?
Well, quiet obviously 500 because in on

44
00:04:50,772 --> 00:04:57,312
the average, you would toss a coin to
decide whether the two vectors differed

45
00:04:57,312 --> 00:05:04,022
from each other, and half the time they
wouldn't, half the time they wouldn't, so

46
00:05:04,022 --> 00:05:09,968
in half the, the bits they would differ,
in half they would be the same.

47
00:05:09,968 --> 00:05:14,980
Now consider, a vector X shows in at
random, could be anyone.

48
00:05:15,340 --> 00:05:21,520
We start from this one vector x and ask
these few interesting questions.

49
00:05:23,220 --> 00:05:30,947
Clearly, half all of the other vectors are
less than 500 bits away from this vector,

50
00:05:30,947 --> 00:05:36,719
and half of them are more than 500 bits
away from this vector.

51
00:05:36,719 --> 00:05:41,840
That is, again, obvious by a choice, by,
by tossing a coin.

52
00:05:43,840 --> 00:05:51,116
The really interesting thing.
It's when we ask the question how many of

53
00:05:51,116 --> 00:05:55,560
all the other vectors defer from our
chosen x?

54
00:05:56,140 --> 00:06:02,840
While less than 450 bits.
Now remember, half of the vectors deferred

55
00:06:02,840 --> 00:06:09,941
by less than 500 bits.
Well, how many differ by less than 450

56
00:06:09,941 --> 00:06:14,349
bits?
So we toss a coin, and if the coin comes

57
00:06:14,349 --> 00:06:21,793
up heads we, assume the other vector
differs, from x and otherwise it doesn't,

58
00:06:21,793 --> 00:06:29,530
so we toss a 1000 coins and figure out,
whats the chance, that we will have, less

59
00:06:29,530 --> 00:06:36,153
than 450 heads.
Actually, it's a quite, quite a small

60
00:06:36,153 --> 00:06:43,454
chance that comes from probability theory.
We use the binomial distribution, with a

61
00:06:43,454 --> 00:06:50,634
mean of 500 and N equal to 1000 because we
are tossing a 1000 fair coins.

62
00:06:50,929 --> 00:06:56,142
The standard deviation of this is, square
root of 250.

63
00:06:56,142 --> 00:07:01,060
You can look this up in any probability
text book.

64
00:07:02,080 --> 00:07:07,900
We approximate the binomial distribution
by a normal distribution.

65
00:07:08,360 --> 00:07:14,562
And then we see that.
Very interestingly, a very, very small

66
00:07:14,562 --> 00:07:21,140
fraction of all the other vectors are less
than 450 bits from x.

67
00:07:21,580 --> 00:07:30,330
So what that means is that with a very,
very high probability most of the other

68
00:07:30,330 --> 00:07:40,360
vectors are.
Within a band of 450 and 550 bits away.

69
00:07:41,200 --> 00:07:48,158
Remember the normal distribution is
symmetric, so what we did for, how many

70
00:07:48,158 --> 00:07:55,587
other vectors differ from X by less than
450, can be repeated symmetrically for

71
00:07:55,587 --> 00:08:01,700
asking the question, how many will differ
from X by more than 550.

72
00:08:02,520 --> 00:08:09,921
So we'll, we have this very interesting
situation that, almost all vectors, in

73
00:08:09,921 --> 00:08:14,500
this huge space.
Are far away from x.

74
00:08:15,200 --> 00:08:18,700
Somewhere around 500 bits away.