There is an important intuition behind LSH and similar concepts called dimensionality reduction. Now think of the space or the collection of objects for example you can think of fingerprints. It's D dimensional where you know D is thousand because every print can be represented by D dimensional vector which has 1's and 0's. 1's where there are many [inaudible] 0's where there are not. 1000 because we had a 25 by 40 grid. There are lots and lots of possible objects in this space, not all of which are fingerprints, but definitely the space is very, very large. Two to the 1,000 is really, really huge. What LSH does intuitively, is it reduces the dimension to just b hash values, and b is small, 1,024 in this case. Further. The fact that we use random hash functions. Turns out to make them locality sensitive. Now, this is a very deep idea that random choices and random functions actually work really well, and we won't delve into why that happens right now, but that's another important intuition. As a result, similar objects map to similar hash values. It also turns out that, LS aged techniques are closely related to other kinds of dimensionality reduction, methods, such as for discovering topics in large document collections objects in videos and images, and many other. A caveat though, implementing LS Age can be a bit tricky, especially in parallel, and something we're going to look into during our discussion on big data technology and map reduce. Let's return now to our hidden agenda. We are trying to understand something about human memory from all the techniques that we've been studying. One question we should think about is do we actually store all the objects, such as images and experiences that we get at, exposed to? Do we have a large disk space somewhere and I had where, things are just kept in full fidelity, probably not. Way back in the late'80s and early'90s, Pentti Kanerva then working at NASA. Came up with a concept called sparse distributed memory. Which tried to model how humans actually store data. In particular, high dimensional data like images, experiences, and many of the things that we are exposed to are high dimensional. So this stuff pre-dates LSH, is also related to high dimensional spaces. But rather than reducing the dimension, it also exploits the fact that the space is high dimensional. Space distributed memory exploits some rather peculiar features of high dimensional spaces. An example of a high dimensional space that we've already seen is this space of thousand bit vectors such as all possible grid cells of 25 by 40 with 1's and 0's that we use for fingerprints. There are lots of these to a thousand many, many hundred of 0's behind that. Now, think about any two random thousand-bit vectors. What would be the average distance between these in terms of the number of bits in which they might defer? What do you think? Well, quiet obviously 500 because in on the average, you would toss a coin to decide whether the two vectors differed from each other, and half the time they wouldn't, half the time they wouldn't, so in half the, the bits they would differ, in half they would be the same. Now consider, a vector X shows in at random, could be anyone. We start from this one vector x and ask these few interesting questions. Clearly, half all of the other vectors are less than 500 bits away from this vector, and half of them are more than 500 bits away from this vector. That is, again, obvious by a choice, by, by tossing a coin. The really interesting thing. It's when we ask the question how many of all the other vectors defer from our chosen x? While less than 450 bits. Now remember, half of the vectors deferred by less than 500 bits. Well, how many differ by less than 450 bits? So we toss a coin, and if the coin comes up heads we, assume the other vector differs, from x and otherwise it doesn't, so we toss a 1000 coins and figure out, whats the chance, that we will have, less than 450 heads. Actually, it's a quite, quite a small chance that comes from probability theory. We use the binomial distribution, with a mean of 500 and N equal to 1000 because we are tossing a 1000 fair coins. The standard deviation of this is, square root of 250. You can look this up in any probability text book. We approximate the binomial distribution by a normal distribution. And then we see that. Very interestingly, a very, very small fraction of all the other vectors are less than 450 bits from x. So what that means is that with a very, very high probability most of the other vectors are. Within a band of 450 and 550 bits away. Remember the normal distribution is symmetric, so what we did for, how many other vectors differ from X by less than 450, can be repeated symmetrically for asking the question, how many will differ from X by more than 550. So we'll, we have this very interesting situation that, almost all vectors, in this huge space. Are far away from x. Somewhere around 500 bits away.