1
00:00:00,048 --> 00:00:04,789
[MUSIC]

2
00:00:04,789 --> 00:00:08,957
So all the illustrations we have given so
far assume that we're working in just

3
00:00:08,957 --> 00:00:12,809
two dimensions, but in most,
the examples that we're interested in,

4
00:00:12,809 --> 00:00:17,043
we might have a very large vocabulary or
very high dimensional space that we're

5
00:00:17,043 --> 00:00:20,990
interested in partitioning and
performing nearest neighbor search on.

6
00:00:20,990 --> 00:00:25,650
So let's talk about how we do
things in these higher dimensions.

7
00:00:25,650 --> 00:00:28,577
Well if we have more
than just two dimensions,

8
00:00:28,577 --> 00:00:33,083
let's say we have three dimensions,
instead of throwing down a line,

9
00:00:33,083 --> 00:00:37,156
we're going to think about throwing
down planes or hyperplanes.

10
00:00:37,156 --> 00:00:38,450
So this plane, again,

11
00:00:38,450 --> 00:00:41,623
we're going to assume that it
goes through the zero, zero,

12
00:00:41,623 --> 00:00:46,080
zero point even though this illustration
might not look exactly like it.

13
00:00:46,080 --> 00:00:52,810
But what we're going to say is that the
score of any point x is simply given by v1

14
00:00:54,360 --> 00:00:59,010
times, in this case, the number
of times we see the word awesome,

15
00:00:59,010 --> 00:01:03,440
v2 times the number of times
we see the word awful and

16
00:01:03,440 --> 00:01:06,690
v3 times the number of times
we see the word great.

17
00:01:06,690 --> 00:01:11,390
So there are three different coefficients
that define our plane in three dimensions,

18
00:01:11,390 --> 00:01:16,520
and if we had some d dimensions we
would have d different coefficients

19
00:01:16,520 --> 00:01:19,400
that multiply each of our
d different features.

20
00:01:19,400 --> 00:01:22,780
And that defines our score, and then
we simply look at whether that score is

21
00:01:22,780 --> 00:01:28,000
positive or negative and
use that to define the bin index.

22
00:01:29,320 --> 00:01:33,640
And just like in the two-dimensional
case where we're thinking about

23
00:01:33,640 --> 00:01:38,536
throwing down these hyperplanes, we're
just going to draw these coefficients,

24
00:01:38,536 --> 00:01:42,360
these v1, v2, v3 all the way
up to vd completely at random.

25
00:01:42,360 --> 00:01:47,756
And that's going to define a random
hyperplane that we're going to use for

26
00:01:47,756 --> 00:01:49,320
our partitioning.

27
00:01:49,320 --> 00:01:53,180
And so now I want to talk about what's
the cost of binning our points in this

28
00:01:53,180 --> 00:01:55,190
d-dimensional space.

29
00:01:55,190 --> 00:02:00,457
Well remember that for each one for
our data points in d-dimensions,

30
00:02:00,457 --> 00:02:05,823
we need to determine the bin index
per plane that we're throwing down.

31
00:02:05,823 --> 00:02:09,078
So that means for each one of our,

32
00:02:09,078 --> 00:02:13,743
let's say we're looking
at the ith hyperplane

33
00:02:16,715 --> 00:02:20,218
And it's going to have coefficients,

34
00:02:20,218 --> 00:02:24,625
let's say v1 (i), v2 (i), v3 (i),

35
00:02:24,625 --> 00:02:31,066
well we have to do d multiplications
between these coefficients and

36
00:02:31,066 --> 00:02:37,066
our features to define the score just for
that one hyperplane.

37
00:02:37,066 --> 00:02:42,289
And then we have to repeat that for each
of the hyperplanes that we've thrown down.

38
00:02:42,289 --> 00:02:47,509
Okay, so there is some cost to doing this,

39
00:02:47,509 --> 00:02:52,729
but one thing that I
want to mention is that in

40
00:02:52,729 --> 00:02:57,806
high dimensions and some applications,

41
00:02:57,806 --> 00:03:04,927
this is often a sparse
multiplication that we're doing.

42
00:03:04,927 --> 00:03:09,106
Let me just write down some applications
because this is not generically true, but

43
00:03:09,106 --> 00:03:13,540
we can think about this especially being
true when we're thinking about the number

44
00:03:13,540 --> 00:03:16,962
of dimensions being the vocabulary and
looking at documents and

45
00:03:16,962 --> 00:03:20,250
counts of how many times you
see words in a vocabulary.

46
00:03:20,250 --> 00:03:22,690
Out of all possible words in a vocabulary,

47
00:03:22,690 --> 00:03:26,650
we often see only a few of those
words in any given document.

48
00:03:26,650 --> 00:03:28,093
So even though generically,

49
00:03:28,093 --> 00:03:33,900
we're saying we're doing d different
multiplies per hyperplane, in practice,

50
00:03:33,900 --> 00:03:37,890
we often would only need to compute
a smaller subset of those multiplications.

51
00:03:39,100 --> 00:03:41,400
But the other thing that's really,
really important and

52
00:03:41,400 --> 00:03:46,310
is generically true is that this
one-time cost of forming this hash

53
00:03:46,310 --> 00:03:50,970
table is offset when we're doing many,
many queries, okay?

54
00:03:50,970 --> 00:03:56,202
So, every time we query and
look for a nearest neighbor,

55
00:03:56,202 --> 00:04:00,673
we're going to use this
hash table that we formed.

56
00:04:00,673 --> 00:04:04,302
And we don't have to redo all these
multiplications that we're talking

57
00:04:04,302 --> 00:04:04,963
about here.

58
00:04:04,963 --> 00:04:09,124
[MUSIC]