1
00:00:04,680 --> 00:00:07,319
So the last way in which we're
going to use KD-trees for

2
00:00:07,319 --> 00:00:11,250
nearest neighbor search is for
approximate nearest neighbor search.

3
00:00:11,250 --> 00:00:14,190
And this is something that we're going to
spend a lot of time talking about in

4
00:00:14,190 --> 00:00:15,530
the next section.

5
00:00:15,530 --> 00:00:19,580
But this idea,
it's really important because so

6
00:00:19,580 --> 00:00:22,280
far, everything we've been
talking about assumes that

7
00:00:22,280 --> 00:00:26,030
out of all the articles out there,
you want to find the nearest neighbor.

8
00:00:27,160 --> 00:00:31,720
But that assumption relies so heavily on
first assuming that it's important to

9
00:00:31,720 --> 00:00:35,940
find the nearest neighbor and
not just a good alternative.

10
00:00:35,940 --> 00:00:41,660
But also the fact that you really believe
in your representation of your data, as

11
00:00:41,660 --> 00:00:45,690
well as the distance measure that you're
using to compute these similarities.

12
00:00:45,690 --> 00:00:49,630
because if you don't really have
100% faith in those to begin with,

13
00:00:49,630 --> 00:00:53,720
then what's the difference of finding
the nearest neighbor versus one that's,

14
00:00:53,720 --> 00:00:56,050
pretty, pretty, pretty close to that.

15
00:00:56,050 --> 00:00:59,770
So a question is,
can we do things even more efficiently

16
00:00:59,770 --> 00:01:04,110
if we don't care about finding according
to our setup, the nearest neighbor but

17
00:01:04,110 --> 00:01:06,310
just a pretty good approximation?

18
00:01:06,310 --> 00:01:08,624
So within the context of KD-trees,

19
00:01:08,624 --> 00:01:13,185
the way we can do our approximate
nearest neighbor search is as follows.

20
00:01:13,185 --> 00:01:14,628
Remember before,

21
00:01:14,628 --> 00:01:19,597
we maintained the distance to
the nearest neighbor found so far.

22
00:01:19,597 --> 00:01:25,124
So this, I'm going to call r = distance

23
00:01:25,124 --> 00:01:29,280
to nearest neighbor so far.

24
00:01:30,980 --> 00:01:34,210
But now, what we're going to do,
is we're going to,

25
00:01:34,210 --> 00:01:38,710
instead of pruning based on the distance
to the nearest neighbor so far,

26
00:01:38,710 --> 00:01:40,780
we're going to prune more aggressively.

27
00:01:40,780 --> 00:01:44,030
That's what's going to give
us the added efficiency, but

28
00:01:44,030 --> 00:01:45,870
at the cost of an approximation.

29
00:01:45,870 --> 00:01:52,200
And to prune more aggressively,
instead of pruning based on distance r,

30
00:01:52,200 --> 00:01:57,116
we're going to prune based on a distance,
r over alpha

31
00:01:57,116 --> 00:02:02,490
where alpha is some number greater than 1.

32
00:02:02,490 --> 00:02:09,410
And so this is going to be
a smaller radius that we're using.

33
00:02:09,410 --> 00:02:13,840
So it means it's more likely that
the distance to the bounding box

34
00:02:13,840 --> 00:02:17,115
exceeds r over alpha than it does just r.

35
00:02:17,115 --> 00:02:22,360
Okay, so what are the properties of
performing nearest neighbor search

36
00:02:22,360 --> 00:02:25,230
with this more aggressive pruning?

37
00:02:25,230 --> 00:02:31,054
Well, we know that if we return
a neighbor at some distance r,

38
00:02:31,054 --> 00:02:35,539
then there's no neighbor
within r over alpha.

39
00:02:35,539 --> 00:02:40,175
So, what we're saying here is,

40
00:02:40,175 --> 00:02:45,483
let's say this is our query point Xq.

41
00:02:45,483 --> 00:02:47,532
This is our distance r,

42
00:02:47,532 --> 00:02:52,352
this is the nearest neighbor
that we're going to return.

43
00:02:54,724 --> 00:03:01,851
If we look at, let me draw in a different
color, I think it'll help make this clear.

44
00:03:07,174 --> 00:03:10,186
So if this is r over alpha,

45
00:03:14,981 --> 00:03:18,840
Then we know that there
could be points here.

46
00:03:18,840 --> 00:03:23,776
So, these would all be, Data

47
00:03:23,776 --> 00:03:27,416
points that were actually closer than
the nearest neighbor that we returned.

48
00:03:30,983 --> 00:03:39,160
But we know,
That there's no points within this box.

49
00:03:39,160 --> 00:03:40,529
Otherwise we would have found them and

50
00:03:40,529 --> 00:03:42,680
that would have been our
distance to the nearest neighbor.

51
00:03:44,520 --> 00:03:47,490
And we're pruning all other things
that fall further from that.

52
00:03:48,940 --> 00:03:53,620
Okay, so by doing this,
we get a fairly good approximation, and

53
00:03:53,620 --> 00:03:58,386
actually this bound that we're
stating here is rather loose, and

54
00:03:58,386 --> 00:04:03,089
in practice, we're often much
closer to the optimal solution.

55
00:04:03,089 --> 00:04:08,517
And importantly, it just saves
a ton of searching you have to do,

56
00:04:08,517 --> 00:04:11,890
of course if alpha is reasonably large.

57
00:04:11,890 --> 00:04:18,299
If alpha becomes 1, then it defaults back
to our exact nearest neighbor search.

58
00:04:18,299 --> 00:04:24,046
Okay, so in a lot of cases, doing
approximate nearest neighbor search is

59
00:04:24,046 --> 00:04:30,443
pretty much just as good as what you would
do with brute force, because there's so

60
00:04:30,443 --> 00:04:36,494
much noise in how you're thinking about
computing distances to begin with.

61
00:04:36,494 --> 00:04:38,699
And whether that's really relevant to,

62
00:04:38,699 --> 00:04:41,345
let's say I'm sitting
here reading an article,

63
00:04:41,345 --> 00:04:45,567
whatever you're going to present me as
a pretty good approximation to the nearest

64
00:04:45,567 --> 00:04:49,747
neighbor is probably going to satisfy my
needs for exploring other literature.

65
00:04:51,653 --> 00:04:55,979
Okay, so we're going to talk a lot more
about alternative ways to do approximate

66
00:04:55,979 --> 00:05:01,120
nearest neighbor search, but in ways that
also scale better with higher dimensions.

67
00:05:01,120 --> 00:05:02,540
But within the context of KD-trees,

68
00:05:02,540 --> 00:05:06,600
this is a very effective way of performing
approximate nearest neighbor search.

69
00:05:07,860 --> 00:05:10,800
So to wrap up on KD-trees,

70
00:05:10,800 --> 00:05:14,780
first I want to say that they really are
a very cool and efficient data structure.

71
00:05:14,780 --> 00:05:17,840
But I also want to discuss
a few other things.

72
00:05:17,840 --> 00:05:20,760
One is the fact that there are lots and
lots of variants.

73
00:05:20,760 --> 00:05:24,725
We talked about some of these in terms of
heuristics for how to choose the splitting

74
00:05:24,725 --> 00:05:28,014
dimension and the splitting value,
and all sorts of other things.

75
00:05:28,014 --> 00:05:32,696
And, in addition, instead of just
making these axis-aligned cuts,

76
00:05:32,696 --> 00:05:35,827
there are other ways of
performing a KD-tree.

77
00:05:35,827 --> 00:05:40,010
Well, it's really these ball-trees and
different shapes you can plop down.

78
00:05:40,010 --> 00:05:42,509
And what were going to talk about next is,

79
00:05:42,509 --> 00:05:47,455
you can think of it as one of these
alternatives to these axis-aligned cuts.

80
00:05:47,455 --> 00:05:49,300
And the other thing that we've seen so

81
00:05:49,300 --> 00:05:54,370
far is the fact that in performing
nearest-neighbor search, things rely very,

82
00:05:54,370 --> 00:05:56,430
very heavily on how you're
representing your data,

83
00:05:56,430 --> 00:05:59,850
and how you're computing the distance
between two different data points.

84
00:06:01,120 --> 00:06:04,970
And for both of these, KD-trees and
nearest neighbor search,

85
00:06:04,970 --> 00:06:10,189
things become much, much more challenging
when you're in high dimensional spaces.

86
00:06:10,189 --> 00:06:14,963
So for KD-trees, we talked about
the fact that the number of inspections

87
00:06:14,963 --> 00:06:18,667
you have to do can be exponential
in the dimensionality.

88
00:06:18,667 --> 00:06:20,515
So just as a rule of thumb,

89
00:06:20,515 --> 00:06:26,479
typically KD-trees are only useful if the
number of observations you have is much,

90
00:06:26,479 --> 00:06:31,360
much larger than 2 to the d,
2 to the number of dimensions you have.

91
00:06:31,360 --> 00:06:36,020
So in high dimensional cases,
often you don't have enough observations

92
00:06:36,020 --> 00:06:40,370
to really make a KD-tree worthwhile.

93
00:06:40,370 --> 00:06:43,240
And in terms of thinking about
nearest neighbor search for

94
00:06:43,240 --> 00:06:46,880
getting the challenges just of KD-trees,
in high dimensional spaces,

95
00:06:46,880 --> 00:06:51,390
things are really hard, because from
a really high dimensional feature vector,

96
00:06:51,390 --> 00:06:54,920
a lot of those features are going to
be basically just noise.

97
00:06:54,920 --> 00:06:58,472
And those can be swamping
your distance calculations.

98
00:06:58,472 --> 00:07:02,899
And so, what happens is in
a high dimensional spaces,

99
00:07:02,899 --> 00:07:07,823
basically everything is far
away from itself or each other.

100
00:07:07,823 --> 00:07:12,214
And so, you really need techniques to
learn which features are important or

101
00:07:12,214 --> 00:07:16,110
how do you think about wading
across the different features.

102
00:07:16,110 --> 00:07:20,210
And like we mentioned before, that's
a really, really challenging task that

103
00:07:20,210 --> 00:07:24,990
requires a lot of domain knowledge or
a lot of thought and exploration.

104
00:07:24,990 --> 00:07:30,390
And lastly, I really want to thank Andrew
Moore for the outline of these slides.

105
00:07:30,390 --> 00:07:34,690
The slides on KD-trees were built
very heavily from a really nice

106
00:07:34,690 --> 00:07:37,982
presentation that he put
together on KD-trees.

107
00:07:37,982 --> 00:07:41,579
So thanks for the permission to use that.

108
00:07:41,579 --> 00:07:45,417
[MUSIC]