1 00:00:00,000 --> 00:00:07,040 So what locality sensitive hashing does, and this is the main trick in this 2 00:00:07,040 --> 00:00:12,014 technique, is that you use many such functions f. 3 00:00:12,014 --> 00:00:18,046 Remember, each function f corresponds to three chosen grid cells. 4 00:00:18,046 --> 00:00:23,020 Now there are many, many choices for three cells. 5 00:00:23,020 --> 00:00:31,076 Lets take a thousand such choices. And in general be such choices, now we 6 00:00:31,076 --> 00:00:41,076 need to figure out what's the chance that. At least one function, out of these be a 7 00:00:41,076 --> 00:00:45,088 thousand. Actually matches. 8 00:00:46,020 --> 00:00:54,013 Let's work that out. First of all, pq^k, as we have just 9 00:00:54,013 --> 00:01:05,639 computed, is the probability that f(x) = f(y) and their both one for a particular 10 00:01:05,639 --> 00:01:14,394 combination of k cells. One minus of that, is the chance that f(x) 11 00:01:14,394 --> 00:01:25,993 and f(y) are not one for these, k cells. So if you raise this to the power of b, 12 00:01:25,993 --> 00:01:35,313 you'll get the chance that none of the b choices of f have a match. 13 00:01:35,313 --> 00:01:45,299 That is f(x) and f(y) are not one for any of these b functions. 14 00:01:45,299 --> 00:01:57,249 And then again, one minus of that is the chance that at least one of the b 15 00:01:57,249 --> 00:02:05,848 functions matches. So let's go over that one more time. 16 00:02:05,848 --> 00:02:13,153 We have the probability that. K cells match, which is pq^k. 17 00:02:13,153 --> 00:02:22,154 One minus that is the probability that k cells don't match, then one - pq^k raise 18 00:02:22,154 --> 00:02:31,023 to the b is the probability that none of the b functions has a match for x and y. 19 00:02:31,023 --> 00:02:38,069 And then, another one minus that is the chance that at least one of these b 20 00:02:38,069 --> 00:02:43,008 functions is such that f(x) and f(y) are both one. 21 00:02:43,008 --> 00:02:50,021 And interestingly this nice functional form, gives me, gives us 0.997 for the 22 00:02:50,021 --> 00:02:59,398 values p = 0.2, k = three and q = 0.9. And that means that by using many 23 00:02:59,398 --> 00:03:06,683 functions, we get a very high chance that at least one of them will give us a match 24 00:03:06,683 --> 00:03:11,037 if the two prints are actually from the same person. 25 00:03:11,037 --> 00:03:16,030 And this is the trick Of locality sensitive hashing.