1 00:00:00,000 --> 00:00:08,295 There is an important intuition behind LSH and similar concepts called dimensionality 2 00:00:08,295 --> 00:00:14,642 reduction. Now think of the space or the collection 3 00:00:14,642 --> 00:00:19,402 of objects for example you can think of fingerprints. 4 00:00:19,402 --> 00:00:26,138 It's D dimensional where you know D is thousand because every print can be 5 00:00:26,138 --> 00:00:31,347 represented by D dimensional vector which has 1's and 0's. 6 00:00:31,347 --> 00:00:36,826 1's where there are many [inaudible] 0's where there are not. 7 00:00:36,826 --> 00:00:44,371 1000 because we had a 25 by 40 grid. There are lots and lots of possible 8 00:00:44,371 --> 00:00:50,157 objects in this space, not all of which are fingerprints, but definitely the space 9 00:00:50,157 --> 00:00:54,300 is very, very large. Two to the 1,000 is really, really huge. 10 00:00:56,280 --> 00:01:05,438 What LSH does intuitively, is it reduces the dimension to just b hash values, and b 11 00:01:05,438 --> 00:01:09,460 is small, 1,024 in this case. Further. 12 00:01:09,720 --> 00:01:13,700 The fact that we use random hash functions. 13 00:01:14,500 --> 00:01:21,621 Turns out to make them locality sensitive. Now, this is a very deep idea that random 14 00:01:21,621 --> 00:01:29,000 choices and random functions actually work really well, and we won't delve into why 15 00:01:29,000 --> 00:01:34,320 that happens right now, but that's another important intuition. 16 00:01:35,420 --> 00:01:39,600 As a result, similar objects map to similar hash values. 17 00:01:40,480 --> 00:01:49,008 It also turns out that, LS aged techniques are closely related to other kinds of 18 00:01:49,008 --> 00:01:57,750 dimensionality reduction, methods, such as for discovering topics in large document 19 00:01:57,750 --> 00:02:03,720 collections objects in videos and images, and many other. 20 00:02:04,880 --> 00:02:11,856 A caveat though, implementing LS Age can be a bit tricky, especially in parallel, 21 00:02:11,856 --> 00:02:18,303 and something we're going to look into during our discussion on big data 22 00:02:18,303 --> 00:02:24,156 technology and map reduce. Let's return now to our hidden agenda. 23 00:02:24,156 --> 00:02:30,663 We are trying to understand something about human memory from all the techniques 24 00:02:30,663 --> 00:02:37,924 that we've been studying. One question we should think about is do 25 00:02:37,924 --> 00:02:45,349 we actually store all the objects, such as images and experiences that we get at, 26 00:02:45,349 --> 00:02:50,082 exposed to? Do we have a large disk space somewhere 27 00:02:50,082 --> 00:02:56,300 and I had where, things are just kept in full fidelity, probably not. 28 00:02:58,720 --> 00:03:09,260 Way back in the late'80s and early'90s, Pentti Kanerva then working at NASA. 29 00:03:09,965 --> 00:03:14,920 Came up with a concept called sparse distributed memory. 30 00:03:15,440 --> 00:03:20,285 Which tried to model how humans actually store data. 31 00:03:20,285 --> 00:03:27,647 In particular, high dimensional data like images, experiences, and many of the 32 00:03:27,647 --> 00:03:32,400 things that we are exposed to are high dimensional. 33 00:03:32,960 --> 00:03:39,680 So this stuff pre-dates LSH, is also related to high dimensional spaces. 34 00:03:39,680 --> 00:03:47,537 But rather than reducing the dimension, it also exploits the fact that the space is 35 00:03:47,537 --> 00:03:52,837 high dimensional. Space distributed memory exploits some 36 00:03:52,837 --> 00:03:57,760 rather peculiar features of high dimensional spaces. 37 00:03:59,040 --> 00:04:06,979 An example of a high dimensional space that we've already seen is this space of 38 00:04:06,979 --> 00:04:15,118 thousand bit vectors such as all possible grid cells of 25 by 40 with 1's and 0's 39 00:04:15,118 --> 00:04:21,767 that we use for fingerprints. There are lots of these to a thousand 40 00:04:21,767 --> 00:04:32,708 many, many hundred of 0's behind that. Now, think about any two random 41 00:04:32,708 --> 00:04:39,319 thousand-bit vectors. What would be the average distance between 42 00:04:39,319 --> 00:04:44,095 these in terms of the number of bits in which they might defer? 43 00:04:44,095 --> 00:04:50,772 What do you think? Well, quiet obviously 500 because in on 44 00:04:50,772 --> 00:04:57,312 the average, you would toss a coin to decide whether the two vectors differed 45 00:04:57,312 --> 00:05:04,022 from each other, and half the time they wouldn't, half the time they wouldn't, so 46 00:05:04,022 --> 00:05:09,968 in half the, the bits they would differ, in half they would be the same. 47 00:05:09,968 --> 00:05:14,980 Now consider, a vector X shows in at random, could be anyone. 48 00:05:15,340 --> 00:05:21,520 We start from this one vector x and ask these few interesting questions. 49 00:05:23,220 --> 00:05:30,947 Clearly, half all of the other vectors are less than 500 bits away from this vector, 50 00:05:30,947 --> 00:05:36,719 and half of them are more than 500 bits away from this vector. 51 00:05:36,719 --> 00:05:41,840 That is, again, obvious by a choice, by, by tossing a coin. 52 00:05:43,840 --> 00:05:51,116 The really interesting thing. It's when we ask the question how many of 53 00:05:51,116 --> 00:05:55,560 all the other vectors defer from our chosen x? 54 00:05:56,140 --> 00:06:02,840 While less than 450 bits. Now remember, half of the vectors deferred 55 00:06:02,840 --> 00:06:09,941 by less than 500 bits. Well, how many differ by less than 450 56 00:06:09,941 --> 00:06:14,349 bits? So we toss a coin, and if the coin comes 57 00:06:14,349 --> 00:06:21,793 up heads we, assume the other vector differs, from x and otherwise it doesn't, 58 00:06:21,793 --> 00:06:29,530 so we toss a 1000 coins and figure out, whats the chance, that we will have, less 59 00:06:29,530 --> 00:06:36,153 than 450 heads. Actually, it's a quite, quite a small 60 00:06:36,153 --> 00:06:43,454 chance that comes from probability theory. We use the binomial distribution, with a 61 00:06:43,454 --> 00:06:50,634 mean of 500 and N equal to 1000 because we are tossing a 1000 fair coins. 62 00:06:50,929 --> 00:06:56,142 The standard deviation of this is, square root of 250. 63 00:06:56,142 --> 00:07:01,060 You can look this up in any probability text book. 64 00:07:02,080 --> 00:07:07,900 We approximate the binomial distribution by a normal distribution. 65 00:07:08,360 --> 00:07:14,562 And then we see that. Very interestingly, a very, very small 66 00:07:14,562 --> 00:07:21,140 fraction of all the other vectors are less than 450 bits from x. 67 00:07:21,580 --> 00:07:30,330 So what that means is that with a very, very high probability most of the other 68 00:07:30,330 --> 00:07:40,360 vectors are. Within a band of 450 and 550 bits away. 69 00:07:41,200 --> 00:07:48,158 Remember the normal distribution is symmetric, so what we did for, how many 70 00:07:48,158 --> 00:07:55,587 other vectors differ from X by less than 450, can be repeated symmetrically for 71 00:07:55,587 --> 00:08:01,700 asking the question, how many will differ from X by more than 550. 72 00:08:02,520 --> 00:08:09,921 So we'll, we have this very interesting situation that, almost all vectors, in 73 00:08:09,921 --> 00:08:14,500 this huge space. Are far away from x. 74 00:08:15,200 --> 00:08:18,700 Somewhere around 500 bits away.