1 00:00:00,048 --> 00:00:04,789 [MUSIC] 2 00:00:04,789 --> 00:00:08,957 So all the illustrations we have given so far assume that we're working in just 3 00:00:08,957 --> 00:00:12,809 two dimensions, but in most, the examples that we're interested in, 4 00:00:12,809 --> 00:00:17,043 we might have a very large vocabulary or very high dimensional space that we're 5 00:00:17,043 --> 00:00:20,990 interested in partitioning and performing nearest neighbor search on. 6 00:00:20,990 --> 00:00:25,650 So let's talk about how we do things in these higher dimensions. 7 00:00:25,650 --> 00:00:28,577 Well if we have more than just two dimensions, 8 00:00:28,577 --> 00:00:33,083 let's say we have three dimensions, instead of throwing down a line, 9 00:00:33,083 --> 00:00:37,156 we're going to think about throwing down planes or hyperplanes. 10 00:00:37,156 --> 00:00:38,450 So this plane, again, 11 00:00:38,450 --> 00:00:41,623 we're going to assume that it goes through the zero, zero, 12 00:00:41,623 --> 00:00:46,080 zero point even though this illustration might not look exactly like it. 13 00:00:46,080 --> 00:00:52,810 But what we're going to say is that the score of any point x is simply given by v1 14 00:00:54,360 --> 00:00:59,010 times, in this case, the number of times we see the word awesome, 15 00:00:59,010 --> 00:01:03,440 v2 times the number of times we see the word awful and 16 00:01:03,440 --> 00:01:06,690 v3 times the number of times we see the word great. 17 00:01:06,690 --> 00:01:11,390 So there are three different coefficients that define our plane in three dimensions, 18 00:01:11,390 --> 00:01:16,520 and if we had some d dimensions we would have d different coefficients 19 00:01:16,520 --> 00:01:19,400 that multiply each of our d different features. 20 00:01:19,400 --> 00:01:22,780 And that defines our score, and then we simply look at whether that score is 21 00:01:22,780 --> 00:01:28,000 positive or negative and use that to define the bin index. 22 00:01:29,320 --> 00:01:33,640 And just like in the two-dimensional case where we're thinking about 23 00:01:33,640 --> 00:01:38,536 throwing down these hyperplanes, we're just going to draw these coefficients, 24 00:01:38,536 --> 00:01:42,360 these v1, v2, v3 all the way up to vd completely at random. 25 00:01:42,360 --> 00:01:47,756 And that's going to define a random hyperplane that we're going to use for 26 00:01:47,756 --> 00:01:49,320 our partitioning. 27 00:01:49,320 --> 00:01:53,180 And so now I want to talk about what's the cost of binning our points in this 28 00:01:53,180 --> 00:01:55,190 d-dimensional space. 29 00:01:55,190 --> 00:02:00,457 Well remember that for each one for our data points in d-dimensions, 30 00:02:00,457 --> 00:02:05,823 we need to determine the bin index per plane that we're throwing down. 31 00:02:05,823 --> 00:02:09,078 So that means for each one of our, 32 00:02:09,078 --> 00:02:13,743 let's say we're looking at the ith hyperplane 33 00:02:16,715 --> 00:02:20,218 And it's going to have coefficients, 34 00:02:20,218 --> 00:02:24,625 let's say v1 (i), v2 (i), v3 (i), 35 00:02:24,625 --> 00:02:31,066 well we have to do d multiplications between these coefficients and 36 00:02:31,066 --> 00:02:37,066 our features to define the score just for that one hyperplane. 37 00:02:37,066 --> 00:02:42,289 And then we have to repeat that for each of the hyperplanes that we've thrown down. 38 00:02:42,289 --> 00:02:47,509 Okay, so there is some cost to doing this, 39 00:02:47,509 --> 00:02:52,729 but one thing that I want to mention is that in 40 00:02:52,729 --> 00:02:57,806 high dimensions and some applications, 41 00:02:57,806 --> 00:03:04,927 this is often a sparse multiplication that we're doing. 42 00:03:04,927 --> 00:03:09,106 Let me just write down some applications because this is not generically true, but 43 00:03:09,106 --> 00:03:13,540 we can think about this especially being true when we're thinking about the number 44 00:03:13,540 --> 00:03:16,962 of dimensions being the vocabulary and looking at documents and 45 00:03:16,962 --> 00:03:20,250 counts of how many times you see words in a vocabulary. 46 00:03:20,250 --> 00:03:22,690 Out of all possible words in a vocabulary, 47 00:03:22,690 --> 00:03:26,650 we often see only a few of those words in any given document. 48 00:03:26,650 --> 00:03:28,093 So even though generically, 49 00:03:28,093 --> 00:03:33,900 we're saying we're doing d different multiplies per hyperplane, in practice, 50 00:03:33,900 --> 00:03:37,890 we often would only need to compute a smaller subset of those multiplications. 51 00:03:39,100 --> 00:03:41,400 But the other thing that's really, really important and 52 00:03:41,400 --> 00:03:46,310 is generically true is that this one-time cost of forming this hash 53 00:03:46,310 --> 00:03:50,970 table is offset when we're doing many, many queries, okay? 54 00:03:50,970 --> 00:03:56,202 So, every time we query and look for a nearest neighbor, 55 00:03:56,202 --> 00:04:00,673 we're going to use this hash table that we formed. 56 00:04:00,673 --> 00:04:04,302 And we don't have to redo all these multiplications that we're talking 57 00:04:04,302 --> 00:04:04,963 about here. 58 00:04:04,963 --> 00:04:09,124 [MUSIC]