Let's consider some other kinds of search now. So far, we have been indexing objects, such as documents, by the features they contain, which are words. Or in the case of records in a database, they are the contents of different columns. Our assumption has been that when we search for a document or a record, our query is a bag of words. In other words, features are what we search with. But suppose our query is itself an object. For example, an image, such as in Google Goggles or Google Image search, or a bio-metric, such as a fingerprint and iris scan in the example of the unique identification project of the Government of India. For those of you not familiar with this, ambitious and equally controversial project, it aims to... Every resident of India with the unique bio-metric ID. We will return to this in a minute. But coming back to the question of searching for an object is in inverted index of features, whether they are words or image features, or other features in the case of fingerprints and irises, the best way to search for objects. What do you think? I would say that, no, it probably is not the best way to search for objects. As to why, I encourage you'd all to think about this. And discuss this, on the discussion forum. But, right now we're going to discuss. Another very powerful method, called locality sensitive hashing. In-, invented by Indic and Motwani in the late'90s, and described in. Fair detail in the Anand Rajaraman book, which is a reference for the course. This technique is particularly important in all kinds of big data applications because it somehow manages to compare, at least approximately, n pair of objects in linear, that is order and time. There are n square pairs, but it manages to tell us which pairs are actually close together without actually comparing all pairs. Sounds like magic, let's see how. The basic idea of locality sensitive hashing, LSH as we shall refer to it going forward. Is the following. It's hashing so, an object is obviously hashed a something called HX. In a manner so as, that if X and Y are two objects. And if X is close to Y. Then the two hash functions are equal with high probability. In this respect its similar to normal hashing where if X is equal to Y, then X and Y are hashed with high probabilities that you are just saying value. But this deals with, distances and closeness and not just equality and that's, that's what makes it interesting. Its hashing which is sensitive to things being close together or in the same locality. And conversely obviously if x is not close to y, rather x is far from y, then the two hash functions will not be equal with a high probability. Of course constructing the hash functions themselves is quite tricky. And here, we have to. Do interesting things like. Take random functions from locally sensitive family. And combine them in particular ways. Allman and Roger Allman's chapter three described this process. Fairly well, and in considerable detail. An example is biometric matching. In the case of the UID project for example, more than a billion people are going to get biometric IDs. Of these, more than 200,000,000 have already been enrolled. At the same let me make the following disclaimer that, what the UID actually uses internally is proprietary and I'm not intending to suggest that they use locality sensitive hashing. This is merely a motivating example.