1
00:00:00,012 --> 00:00:03,350
So in this video and the next, we're going
to study a very cool divide and conquer

2
00:00:03,362 --> 00:00:06,835
algorithm for the closest pair problem.
this is a problem where you're given n

3
00:00:06,847 --> 00:00:10,230
points in the plane and you want to figure
out which pair of points are closest to

4
00:00:10,242 --> 00:00:13,240
each other. So this would be the first
taste we get of an application in

5
00:00:13,252 --> 00:00:17,229
computational geometry, which is the part
of algorithms which studies how to reason

6
00:00:17,241 --> 00:00:21,251
and manipulate geometric objects. So those
algorithms are important in, among other

7
00:00:21,263 --> 00:00:25,424
areas robotics, computer vision and
computer graphics. So this is relatively

8
00:00:25,436 --> 00:00:29,013
advanced material, it's a bit more
difficult than the other applications of

9
00:00:29,025 --> 00:00:32,881
divide and conquer that we've seen. The
algorithm's a little bit tricky and it has

10
00:00:32,893 --> 00:00:36,596
a quite nontrivial proof of correctness,
so just be ready for that and also be

11
00:00:36,608 --> 00:00:41,073
warned that because it's more advanced I'm
going to talk about the material in at a

12
00:00:41,085 --> 00:00:44,797
slightly faster pace tha I do in most of
the other videos. So let's begin now by

13
00:00:44,809 --> 00:00:48,279
defining the problem formally, so we're
given as imput endpoints in the plane, so

14
00:00:48,291 --> 00:00:51,811
each one just define by its x coordinate
and ist y coordinate. And when we talk

15
00:00:51,823 --> 00:00:54,988
about the distance between two points in
this problem, we're going to focus on

16
00:00:55,000 --> 00:00:58,378
Euclidean distance. So, let me remind you
what that is briefly, but we're going to

17
00:00:58,390 --> 00:01:01,651
introduce some simple notation for that,
which we'll use for the rest of the

18
00:01:01,663 --> 00:01:05,532
lecture. So we're just going to note the
Euclidean distance between two points, pi

19
00:01:05,544 --> 00:01:09,368
and pj, by d of pi pj. So in terms of the
x and y coordinates of these two points,

20
00:01:09,458 --> 00:01:13,528
we just look at the squared differences in
each coordinate, sum them up, and take the

21
00:01:13,540 --> 00:01:17,342
square root. And now as the name of the
problem would suggest, the goal is to

22
00:01:17,354 --> 00:01:20,900
identify among all pairs of points that
pair which has the smallest distance

23
00:01:20,900 --> 00:01:23,860
between them.
Next, let's start getting a feel for the

24
00:01:23,872 --> 00:01:27,479
problem by making some preliminary
observations. First, I want to make an

25
00:01:27,491 --> 00:01:31,497
assumption purely for convenience that
there's no ties. So that is I'm going to

26
00:01:31,509 --> 00:01:35,436
assume all endpoints have distinct x
coordinat es, and also all endpoints have

27
00:01:35,448 --> 00:01:38,909
distinct y coordinates.
It's not difficult to extend the algorithm

28
00:01:38,921 --> 00:01:42,648
to accommodate ties. I'll leave it to you
to think about how to do that. So next,

29
00:01:42,660 --> 00:01:46,364
let's draw some parallels with the problem
of counting inversions, which was a

30
00:01:46,376 --> 00:01:50,029
earlier application of divide and conquer
that we saw. The first parallel I want,

31
00:01:50,168 --> 00:01:54,280
want to out is that, if we're comfortable
with the quadratic time algorithm, then

32
00:01:54,292 --> 00:01:57,930
this is not a hard problem, we can simply
solve this by brute-force search. And

33
00:01:57,942 --> 00:02:01,355
again, by brute-force search, I just mean
we set up a double for loop, which

34
00:02:01,367 --> 00:02:04,955
iterates over all distinct pairs of
points. We compute the distance for each

35
00:02:04,967 --> 00:02:08,880
such pair and we remember the smallest.
That's clearly a correct algorithm, it has

36
00:02:08,892 --> 00:02:12,964
to iterate over a quadratic number of
pairs, so its running time is going to be

37
00:02:13,191 --> 00:02:17,314
theta of n squared. And, as always, the
question is can we apply some algorithmic

38
00:02:17,326 --> 00:02:21,344
ingenuity to do better? Can we have a
better algorithm than this naive one which

39
00:02:21,356 --> 00:02:25,264
iterates over all pairs of points? You
might have a, an initial instinct that

40
00:02:25,276 --> 00:02:29,167
because the problem asks about a quadratic
number of different objects, perhaps we

41
00:02:29,179 --> 00:02:32,589
fundamentally need to do quadratic work.
But again, recall back in counting

42
00:02:32,601 --> 00:02:36,278
inversions, using divide and conquer, we
were able to get an n log n algorithm

43
00:02:36,363 --> 00:02:40,316
despite the fact that there might be as
many as a quadratic number of inversions

44
00:02:40,328 --> 00:02:44,330
in an array. So the question is, can we do
something similar here for the closest

45
00:02:44,342 --> 00:02:47,941
pair problem? Now, one of the keys to
getting an n log n time algorithm for

46
00:02:47,953 --> 00:02:51,722
counting inversions was to leverage a
sorting subroutine. Recall that we

47
00:02:51,734 --> 00:02:55,790
piggybacked on merge sort to count the
number of inversions in n log n time. So

48
00:02:55,802 --> 00:02:59,874
the question is, here, with the closest
pair problem, perhaps, sorting again can

49
00:02:59,886 --> 00:03:03,342
be useful in some way to beat the
quadratic barrier. So, to develop some

50
00:03:03,354 --> 00:03:07,278
evidence that sorting will indeed help us
compute the closest pair of points

51
00:03:07,290 --> 00:03:11,239
embedded in quadratic time, let's look at
a special case of the problem, really, an

52
00:03:11,251 --> 00:03:15,123
easier version of t he problem, which is
when the points are just in one dimension,

53
00:03:15,212 --> 00:03:18,654
so on the line rather that in two
dimensions in the plane. So in the 1D

54
00:03:18,666 --> 00:03:23,277
version, all the points just lie on a line
like this one, and we're given the points

55
00:03:23,289 --> 00:03:27,753
in some arbitrary order not necessarily in
sorted order. So, a way to solve the

56
00:03:27,765 --> 00:03:32,204
closest pair problem in one dimension, is
to simply sort the points, and then of

57
00:03:32,216 --> 00:03:36,300
course, the closest pair better be
adjacent in this ordering, so you just

58
00:03:36,312 --> 00:03:40,665
iterate through the n minus 1 consecutive
pairs and see which one is closest to each

59
00:03:40,677 --> 00:03:44,385
other So, more formally, here's how you
solve the one-dimensional version of the

60
00:03:44,397 --> 00:03:47,785
problem. You sort the points according to
their only coordinate, because you're

61
00:03:47,797 --> 00:03:51,510
going to remember, this is one dimension.
So as we've seen, using merge sort, we can

62
00:03:51,522 --> 00:03:55,335
sort the points in n log n time and then
we just do a scan through the points, so

63
00:03:55,347 --> 00:03:58,685
this takes linear time. And for each
consecutive pair, we compute their

64
00:03:58,697 --> 00:04:02,310
distance and we remember the smallest of
those consecutive pairs and we return

65
00:04:02,322 --> 00:04:06,078
that. That's gotta be the closest pair.
So, in this picture here on the right, I'm

66
00:04:06,090 --> 00:04:09,334
just going to circle here in green the
closest pair of points. So this is

67
00:04:09,346 --> 00:04:12,862
something we discover by sorting and then
doing a linear scan. Now, needless to say,

68
00:04:12,945 --> 00:04:16,440
this isn't directly useful, this is not
the problem I started out with. We wanted

69
00:04:16,452 --> 00:04:19,600
to find out the closest pair among of
points in the plane not points in the

70
00:04:19,612 --> 00:04:23,044
line. But, I want to point out that, this,
even in the line, there are a quadratic

71
00:04:23,056 --> 00:04:26,915
number of different pairs, so brute-force
search is still a quadratic time algorythm

72
00:04:27,422 --> 00:04:31,690
even in the 1D case. So at least, with one
dimension, we can use sorting, piggyback

73
00:04:31,702 --> 00:04:35,704
on it, to beat the naive brute-force
search bound and solve the problem in n

74
00:04:35,716 --> 00:04:39,635
log n time. So our goal for this lecture
is going to be to devise an equally good

75
00:04:39,647 --> 00:04:43,761
algorithm for the two-dimensional case, so
we want to solve closest pair of points in

76
00:04:43,773 --> 00:04:47,558
the plane, again, in n log n, n time.
So we will succeed in this goal. I'm going

77
00:04:47,558 --> 00:04:51,136
to show you an n log n time algo rithm for
2D closest pair. It's going to take us a

78
00:04:51,148 --> 00:04:54,779
couple steps. So let me begin with a high
level approach. Alright. So the first I

79
00:04:54,791 --> 00:04:58,374
need to try is just to copy what worked
for us in the one-dimensional case. So the

80
00:04:58,555 --> 00:05:02,134
one-dimensional case, we first sorted the
points by their coordinate and that was

81
00:05:02,146 --> 00:05:06,017
really useful. Now, in the 2D case, points
have two coordinates, x coordinates and y

82
00:05:06,029 --> 00:05:09,610
coordinates, so there's two ways to sort
them. So let's just sort them both ways,

83
00:05:10,102 --> 00:05:13,612
that is, the first step of our algorithm,
which you should really think of as a

84
00:05:13,624 --> 00:05:17,082
preprocessing step. We're going to take
the input points. We invoke merge sort

85
00:05:17,094 --> 00:05:20,385
once to sort them according to x
coordinate, that's one copy of the points.

86
00:05:20,470 --> 00:05:24,039
And then we make a second copy of the
points where they're sorted by y

87
00:05:24,051 --> 00:05:27,754
coordinates. So we're going to call those
copies of points px, that's an array of

88
00:05:27,766 --> 00:05:31,646
the points sorted by x coordinate, and py
for them sorted by y coordinate. Now, we

89
00:05:31,658 --> 00:05:35,513
know merge short takes n log n times, so
this preprocessing step only takes o of n

90
00:05:35,525 --> 00:05:39,228
log n time. And again, given that we're
shooting for an algorithm with running

91
00:05:39,240 --> 00:05:42,953
time big O of n log n, why not sort the
points? We don't even know how we're going

92
00:05:42,953 --> 00:05:46,660
to use this fact right now, but it's sort
of harmless. It's not going to effect our

93
00:05:46,672 --> 00:05:50,610
goal of getting a big of O n log n time
algorithm. And indeed, this illustrates a

94
00:05:50,622 --> 00:05:54,410
broader point, which is one of the themes
of this course. So recall, I hope one of

95
00:05:54,422 --> 00:05:57,985
the things you take away from this course
is a sense for what are the four free

96
00:05:57,997 --> 00:06:01,771
primitives, what are manipulations or
operations you can do on data which

97
00:06:01,783 --> 00:06:05,273
basically are costless. Meaning that if
your data set fits in the main memory of

98
00:06:05,285 --> 00:06:08,750
your computer, you can basically invoke
the primitive and it's just going to run

99
00:06:08,762 --> 00:06:12,139
blazingly fast, and you can just do it
even if you don't know why. And again,

100
00:06:12,223 --> 00:06:15,450
sorting is the canonical for free
primitive, although, we'll see some more

101
00:06:15,462 --> 00:06:18,953
later in the course and so, here, we're
using exactly that principle.

102
00:06:18,962 --> 00:06:22,689
So we don't even understand why yet we
might wa nt the points to be sorted. It

103
00:06:22,701 --> 00:06:27,066
just seems like it's probably going to be
useful, motivated by the 1D case, so let's

104
00:06:27,078 --> 00:06:31,085
go ahead and make assorted copies of the
points by x and y coordinate upfront. So

105
00:06:31,097 --> 00:06:35,438
reasoning by analogy with the 1D suggests
that sorting the points might be useful,

106
00:06:35,526 --> 00:06:39,228
but we can't carry this analogy too far.
So in particular, we're not going to be

107
00:06:39,240 --> 00:06:43,147
able to get away with just a simple linear
scan through these arrays to identify the

108
00:06:43,159 --> 00:06:47,156
closest pair of points. So, to see that,
consider the following example.

109
00:06:47,168 --> 00:06:51,424
So we're going to look at a point set
which has six points. There's going to be

110
00:06:51,436 --> 00:06:56,120
two points, which I'll put in blue which
are very close in x coordinate, but very

111
00:06:56,132 --> 00:07:00,829
far away in y coordinate. And then there's
going to be another pair of points which

112
00:07:00,841 --> 00:07:05,210
I'll do in green, which are very close in
y coordinate, but very far away in x

113
00:07:05,222 --> 00:07:09,463
coordinate. And then there's going to be a
red pair of points, which are not too far

114
00:07:09,475 --> 00:07:13,423
away in either the x coordinate or the y
coordinate. So in this set of six points,

115
00:07:13,514 --> 00:07:17,142
the closest pair is the pair of red
points. They're not even going to show up

116
00:07:17,154 --> 00:07:21,027
consecutively on either of the two arrays,
right? So in the array that's sorted by x

117
00:07:21,039 --> 00:07:24,829
coordinate, this blue point here is going
to be wedged in between the two red

118
00:07:24,841 --> 00:07:28,893
points, they won't be consecutive. And
similarly in the, in py, which is sort of

119
00:07:28,905 --> 00:07:32,724
by y coordinate, this green coordinate is
going to be wedged between the two red

120
00:07:32,736 --> 00:07:37,026
points. So you won't even notice these red
point if you just do a linear scan if your

121
00:07:37,038 --> 00:07:40,238
px and py, or py look at the consecutive
pairs of points. So, following our

122
00:07:40,250 --> 00:07:44,323
preprocessing step where we just invert,
invoke merge sort twice we're going to do

123
00:07:44,335 --> 00:07:48,289
a quite nontrivial divide and conquer
algorithm to compute the closest pair. So

124
00:07:48,301 --> 00:07:52,227
really, in this algorithm, we're applying
the divide and conquer algorithm twice.

125
00:07:52,316 --> 00:07:56,111
First, internal to the sorting subroutine,
assuming that we use the merge sort

126
00:07:56,123 --> 00:08:00,045
algorithm to sort. Divide and conquer is
being used there to get an n log n running

127
00:08:00,057 --> 00:08:04,058
time in this preprocessing step, and the
n, we're going to use it again on sorted

128
00:08:04,070 --> 00:08:07,760
arrays in a new way and that's what I'm
going to tell you about next. So let's

129
00:08:07,772 --> 00:08:11,685
just briefly review the divide and conquer
algorithm design paradigm before we apply

130
00:08:11,697 --> 00:08:15,335
it to the closest pair problem. So, as
usual, the first step is to figure out a

131
00:08:15,347 --> 00:08:18,760
way to divide your problem into smaller
subproblems. Sometimes this has a

132
00:08:18,772 --> 00:08:22,375
reasonable amount of ingenuity, but it's
not going to. Here in the closest pair

133
00:08:22,387 --> 00:08:25,438
problem, we're going to proceed exactly as
we did in the merge sort and counting

134
00:08:25,450 --> 00:08:28,652
inversions problems, where we took the
array and broke it into its left and right

135
00:08:28,664 --> 00:08:31,748
half. So here, we're going to take the
input point set, and again, just recurse

136
00:08:31,760 --> 00:08:34,693
on the left half of the points, and
recurse on the right half of the points.

137
00:08:34,772 --> 00:08:37,936
So here, by left and right, I mean with
respect to the points x coordinates.

138
00:08:38,017 --> 00:08:41,529
There's pretty much never any ingenuity in
the conquer step, that just means you take

139
00:08:41,541 --> 00:08:44,406
the sub-problems you identified in the
first step, and you solve them

140
00:08:44,418 --> 00:08:47,723
recursively. That's what we'll do here,
we'll recursively complete the closest

141
00:08:47,735 --> 00:08:51,058
pair in the left half of the points, and
the closest pair in the right half of the

142
00:08:51,070 --> 00:08:54,517
points. So where all the creativity in
divide and conquer algorithms is in the

143
00:08:54,529 --> 00:08:58,208
combined step. Given the solutions to your
sub problems, how do you somehow recover a

144
00:08:58,220 --> 00:09:01,598
solution to the original problem? The one
that you actually care about. So for

145
00:09:01,819 --> 00:09:05,310
closest pair, the questionis going to be,
given that you've computed the closest

146
00:09:05,322 --> 00:09:08,775
pair on the left half of the points, and
the closest pair on the right half of the

147
00:09:08,787 --> 00:09:12,383
points, how do you then quickly recover
the closest pair for the whole point set?

148
00:09:12,392 --> 00:09:15,800
That's a tricky problem, that's what we're
going to spend most of our time on. So

149
00:09:15,812 --> 00:09:18,870
let's make this divide and conquer
approach for closest pair a little bit

150
00:09:18,882 --> 00:09:22,325
more precise, so let's now actually start
spelling out our closest pair algorithm.

151
00:09:22,407 --> 00:09:26,240
The input we're given, it's, this follows
the preprocessing steps or recall that we

152
00:09:26,252 --> 00:09:30,626
invoke, merge sort, we get our two sorted
copies of the poin t set Px, sorted by x

153
00:09:30,638 --> 00:09:34,778
coordinate, and py sorted by y coordinate.
So the first dividend is the division

154
00:09:34,790 --> 00:09:38,837
step. So given that we have a copy of the
points px sorted by x coordinate, it's

155
00:09:38,849 --> 00:09:42,952
easy to identify the leftmost half of the
points, those with the, those n over two

156
00:09:43,318 --> 00:09:47,244
smallest x coordinates and in the right
half, those were the n over two largest x

157
00:09:47,256 --> 00:09:51,232
coordinates. We're going to call those Q
and R respectively. One thing I'm skipping

158
00:09:51,244 --> 00:09:54,829
over is the base case. I'm not going to
bother writing that down, so base case

159
00:09:54,841 --> 00:09:58,659
omitted, but it's what you would think it
would be. So basically once you have a

160
00:09:58,671 --> 00:10:02,374
small number point, say two points or
three points, then you can just solve the

161
00:10:02,386 --> 00:10:06,368
problem in constant time by a brute-force
search. You just look at all the pairs and

162
00:10:06,380 --> 00:10:09,955
you return the closest pair. So think of
it being at least four points in the

163
00:10:09,967 --> 00:10:13,567
input. Now, in order to recurse, to call
clo pair again, in the left and right

164
00:10:13,579 --> 00:10:17,408
halves, we need sorted version of Q and R,
both by x coordinate and by y coordinate,

165
00:10:17,496 --> 00:10:21,301
so we're just going to form those by doing
suitable linear scans through px and py.

166
00:10:21,702 --> 00:10:26,202
And so one thing I encourage you to think
through carefully or maybe even code up

167
00:10:26,447 --> 00:10:31,366
after the video is how would you form Qx,
Qy, Rx and Ry given that you already have

168
00:10:31,378 --> 00:10:35,843
Px and Py. And if you think about it,
because Px and Py are already sorted just

169
00:10:35,855 --> 00:10:39,392
producing these sorted sublists takes
linear time. It's in some sense the

170
00:10:39,404 --> 00:10:43,380
opposite of the merge subroutine used in
merge sort. Here, we're sort of splitting

171
00:10:43,392 --> 00:10:47,341
rather than merging. But again, this can
be done in linear time, that's something

172
00:10:47,353 --> 00:10:51,329
you should think through carefully later.
So that's the division step, now we just

173
00:10:51,341 --> 00:10:54,489
conquer, meaning we recursively call
closest pair line on each of the two

174
00:10:54,501 --> 00:10:59,275
subproblems, so when we invoke closest
pair on the left half of the points on Q

175
00:10:59,662 --> 00:11:05,175
we're going to get back what are indeed,
the closest pair of points amongst those

176
00:11:05,187 --> 00:11:10,475
in Q. So we're going to call those P1 and
Pq, So among all pairs of points that both

177
00:11:10,487 --> 00:11:15,775
lie in Q, P1 and Q1 minimize the distance
between them. Similarly, we're going to

178
00:11:15,787 --> 00:11:20,739
call Q2Q2 the results of the second
recursive call, that is, P2 and Q2 are

179
00:11:20,751 --> 00:11:25,628
amongst all pairs of points that both lie
in R, the pair that has the minimum

180
00:11:25,640 --> 00:11:30,755
Euclidean distance. Now, conceptually,
there's two cases. There's a lucky case

181
00:11:30,864 --> 00:11:35,935
and there's an unlucky case. In the
original point set P, if we're lucky, the

182
00:11:35,947 --> 00:11:41,081
closest pair of points in all of P,
actually, both of them lie in Q or both of

183
00:11:41,093 --> 00:11:45,797
them lie in R. In this lucky case, we'd
already be done if the closest pair in the

184
00:11:45,809 --> 00:11:50,314
entire point set they happen to both lie
in Q, then this first recursive call is

185
00:11:50,326 --> 00:11:55,260
going to recover them and we just have
them in our hands P1Q1. Similarly, if both

186
00:11:55,272 --> 00:12:00,502
of the closest pair of points in all of P
lies on the right side in R, then they get

187
00:12:00,514 --> 00:12:05,525
handed to us on a silver platter by the
second recursive call that just operate on

188
00:12:05,537 --> 00:12:10,594
R. So in the unlucky case, the closest
pair of point in P happens to be split.

189
00:12:10,705 --> 00:12:15,880
That is, one of the points lies in the
left half, in Q, and the other point lies

190
00:12:15,892 --> 00:12:20,672
in the right half, in R. Notice, if the
closest pair of points in all of P is

191
00:12:20,684 --> 00:12:25,285
split, is half in Q and half in R, neither
recursive call is going to find it. Okay?

192
00:12:25,381 --> 00:12:29,713
The pair of points is not passed to either
of the two recursive calls, so there's no

193
00:12:29,725 --> 00:12:34,116
way it's going to be returned to us. Okay?
So we have not identified the closest pair

194
00:12:34,128 --> 00:12:38,114
after these two recursive calls, if the
closest pair happens to be split.

195
00:12:38,114 --> 00:12:41,648
This is exactly analagous to what happened
when we were counting inversions. The

196
00:12:41,660 --> 00:12:45,031
recursive call on the left half of the
array counted the left inversions. The

197
00:12:45,043 --> 00:12:48,627
recursive call on the right half of the
array counted the right inversions. But we

198
00:12:48,639 --> 00:12:52,082
still had to count the split inversions,
so in this closest pair algorithm, we

199
00:12:52,094 --> 00:12:55,509
still need a special purpose subroutine
that computes the closest pair for the

200
00:12:55,521 --> 00:12:59,110
case in which it is split, in which there
is one point in Q and one point in R. So

201
00:12:59,122 --> 00:13:02,572
just like in counting inversions, I'm
going to write down that subroutine and

202
00:13:02,584 --> 00:13:06,132
I'm going to leave it unimplemented for
now, we'll figur e out how to implement it

203
00:13:06,341 --> 00:13:10,310
quickly in the rest of the lecture. Now,
if we have a correct implementation of

204
00:13:10,322 --> 00:13:14,811
closest split pair, so that takes us input
the original point set sort of the x and y

205
00:13:14,823 --> 00:13:19,100
coordinate, and returns the smallest pair
that's split or one points in Q and one

206
00:13:19,112 --> 00:13:23,464
points in R, then we're done. So then, the
split, then the closest pair has to either

207
00:13:23,476 --> 00:13:27,180
be on the lef or onm the right or it has
to be split. Steps two through four

208
00:13:27,192 --> 00:13:31,430
compute the closest pair in each of those
categories, so those are the only possible

209
00:13:31,442 --> 00:13:35,540
candidates for the closest pair and we
just returned the best of them. So that's

210
00:13:35,552 --> 00:13:39,578
an argument for y, if we have a correct
implementation of the closest split para

211
00:13:39,578 --> 00:13:43,915
subroutine, then that implies a correct
implementation of closest pair. Now, what

212
00:13:43,927 --> 00:13:47,785
about the running time? So the running
time of the closest para algorithm is

213
00:13:47,797 --> 00:13:52,901
going to be in part determined by the
running time of closest split pair. So in

214
00:13:52,913 --> 00:13:58,013
the next quiz, I want you to think about
what kind of running time we should be

215
00:13:58,025 --> 00:14:03,173
shooting for with a closest split pair
subroutine. So the correct response of

216
00:14:03,185 --> 00:14:07,435
this quiz is the second one, and the
reasoning is just by analogy with our

217
00:14:07,447 --> 00:14:11,285
previous algorithms for merge sort and for
counting inversions. So, what is all of

218
00:14:11,297 --> 00:14:14,985
the work that we would do in this
algorithm or we do have this preprocessing

219
00:14:14,997 --> 00:14:18,485
step we call merge sort twice, we know
that's n log n, so we're not going to have

220
00:14:18,497 --> 00:14:22,035
a running time better than n log n cause
we sort at the beginning. And then, we

221
00:14:22,047 --> 00:14:25,610
have a recursive algorithm with the
following flavor, it makes two recursive

222
00:14:25,610 --> 00:14:28,393
calls.
Each recursive call is on a problem of

223
00:14:28,405 --> 00:14:32,771
exactly half the size with half the points
of the original one. And outside of the

224
00:14:32,783 --> 00:14:36,947
recursive calls, by assumption, by, in the
problem, we do a linear amount of work in

225
00:14:36,959 --> 00:14:41,250
computing the closest split pair. So we,
the exact same recursion tree which proves

226
00:14:41,262 --> 00:14:45,226
an n log n bound for merge sort, proves an
n log n bound for how much work we do

227
00:14:45,238 --> 00:14:48,967
after the preprocessing step, so that
gives us an overall running time bound of

228
00:14:48,979 --> 00:14:52,731
n log n. Remem ber, that's what we were
shooting for. We were working n log n

229
00:14:52,743 --> 00:14:56,268
already to solve the one-dimensional
version of closest pair and the goal of

230
00:14:56,280 --> 00:14:59,916
these lectures is to have an n log n
algorithm for the 2D versions. So this

231
00:14:59,928 --> 00:15:04,209
would be great. So in other words, the
goal should be to have a correct linear

232
00:15:04,221 --> 00:15:08,879
time implementation of the closest split
pair subroutine. If we can do that, we're

233
00:15:09,160 --> 00:15:12,989
home-free, we get the desired n log
algorithm. Now, I'm going to proceed in a

234
00:15:13,001 --> 00:15:17,379
little bit to show you how to implement
closest split pair, but before I do that,

235
00:15:17,477 --> 00:15:21,608
I want to point out one subtle, the key
idea, which is going to allow us to get

236
00:15:21,620 --> 00:15:26,336
this linear time correct implementation.
So, let me just put that on the slide. So,

237
00:15:26,435 --> 00:15:30,865
the key idea is that we don't actually
need a full-blown correct implementation

238
00:15:30,877 --> 00:15:35,258
of the closets split pair subroutine. So,
I'm not actually going to show you a

239
00:15:35,270 --> 00:15:39,625
linear time subroutine that always
correctly computes the closets split pair

240
00:15:39,637 --> 00:15:44,502
of a point set. The reason I'm going to do
that is that's actually a strictly harder

241
00:15:44,514 --> 00:15:49,018
problem than what we need to have a
correct recursive algorithm. We do not

242
00:15:49,030 --> 00:15:54,259
actually need a subroutine that, for every
point sets, always correctly computes the

243
00:15:54,271 --> 00:15:58,855
closest split pair of points. Remember,
there's a lucky case and there's an

244
00:15:58,867 --> 00:16:03,568
unlucky case. The lucky case is where the
closest pair in the whole point set P

245
00:16:03,580 --> 00:16:08,192
happens to lie entirely in the left half
of the points Q or in the right half of

246
00:16:08,204 --> 00:16:12,797
the points R In that lucky case, we, one
of our recursive calls will identify this

247
00:16:12,809 --> 00:16:17,082
closest pair and hand it over to us on a
silver platter. We could care less about

248
00:16:17,094 --> 00:16:21,435
the split pairs in that case. We get the
right answer without even looking at the

249
00:16:21,447 --> 00:16:25,427
split pair, pairs. Now, there's this
unlucky case where the split pairs happens

250
00:16:25,439 --> 00:16:29,805
to be the closest pair of points. That is
when we need this linear time subroutine,

251
00:16:29,901 --> 00:16:34,414
and only. then, only in the unlucky case
where the closest pair of points happens

252
00:16:34,426 --> 00:16:38,753
to be split. Now, that's in some sense, a
fairly trivial observation, but, there's a

253
00:16:38,765 --> 00:16:42,986
lot of ingenuity here i n figuring out how
to use that observation. The fact that we

254
00:16:42,998 --> 00:16:46,945
only need to solve a strictly easier
problem and that will enable the linear

255
00:16:46,957 --> 00:16:51,265
time implementation that I'm going to show
you next. So now, let's rewrite the high

256
00:16:51,277 --> 00:16:55,791
level recursive algorithm slightly to make
use of this observation that the closest

257
00:16:55,803 --> 00:17:00,027
split pair subroutine only has to operate
correctly in the regime of the unlucky

258
00:17:00,039 --> 00:17:04,164
case, when in fact, the closest split pair
is closer than the result of either

259
00:17:04,176 --> 00:17:08,206
recursive call. So I've erased the
previous steps 4 and 5, that, but we're

260
00:17:08,218 --> 00:17:12,310
going to rewrite them in a second. So,
before we invoke close split pair, what

261
00:17:12,322 --> 00:17:16,472
we're going to do is we're going to see
how well did our recursive calls do. That

262
00:17:16,484 --> 00:17:20,378
is, we're going to define a parameter
little delta, which is going to be the

263
00:17:20,390 --> 00:17:24,957
closest pair that we found or the distance
of the closest pair we found by either

264
00:17:24,969 --> 00:17:29,325
recursive call. So the minimum of the
distance between P1 and Q1, the closest

265
00:17:29,337 --> 00:17:34,078
pair that lies entirely on the left, and
P2Q2, the closest pair that lies entirely

266
00:17:34,090 --> 00:17:38,631
on the right. Now, we're going to pass
this delta information as a parameter into

267
00:17:38,643 --> 00:17:42,849
our closest split pair subroutine. We're
going to have to see why on earth that

268
00:17:42,861 --> 00:17:47,117
would be useful and I still owe you that
information, but, for now, we're just

269
00:17:47,129 --> 00:17:51,385
going to pass delta as a parameter for use
in the closest split pair. And then, as

270
00:17:51,397 --> 00:17:56,133
before we just do a comparison between the
three candidate closest pairs and return

271
00:17:56,145 --> 00:18:00,331
the best of the, of the trio. And so, just
so we're all clear on, on where things

272
00:18:00,343 --> 00:18:04,721
stand, so what remains is to describe the
implementation of closest split pair, and

273
00:18:04,733 --> 00:18:08,566
before I describe it, let me just be
crystal clear on what it is that we're

274
00:18:08,578 --> 00:18:13,778
going to demand of the subroutine. What do
we need to have a correct in o of n log n

275
00:18:13,216 --> 00:18:18,046
time closest pair algorithm. Well, as you
saw on the quiz, we want the running time

276
00:18:18,058 --> 00:18:22,475
to be o of n always, and for correctness,
what do we need? Again, we don't need it

277
00:18:22,487 --> 00:18:26,840
to always compute the closest split pair,
but we need it to compute the closest

278
00:18:26,852 --> 00:18:31,962
split pair in the events that there is a
split pair of distance strictly less than

279
00:18:31,974 --> 00:18:36,422
delta, strictly better than the outcome of
either recursive call. So now that we're

280
00:18:36,434 --> 00:18:40,874
clear on what we want, let's go ahead and
go through the pseudocode for this closest

281
00:18:40,886 --> 00:18:44,928
split pair subroutine. And I'm going to
tell you upfront, iIt's going to be fairly

282
00:18:44,940 --> 00:18:49,015
straightforward to figure out that the
subroutine runs in linear time, o of n

283
00:18:49,027 --> 00:18:52,527
time. The correctness requirement of
closest split pair will be highly

284
00:18:52,539 --> 00:18:56,148
non-obvious. In fact, after I show you
this pseudo you're not going to believe

285
00:18:56,160 --> 00:19:00,007
me. You're going to look at the pseudocode
and you'd be like, what are you talking

286
00:19:00,019 --> 00:19:03,512
about? But in the second video, on the
closest pair lecture, we will in fact show

287
00:19:03,524 --> 00:19:07,211
that this is a correct sub-routine. So,
how does it work? Well, let's look at a

288
00:19:07,223 --> 00:19:11,500
point set. So, the first thing we're going
to do is a filtering step. We're going to

289
00:19:11,512 --> 00:19:15,790
prune a bunch of the points away and so to
zoom in on a subset of the points. And the

290
00:19:15,612 --> 00:19:19,710
subset of the points we're going to look
at is those that lie in a vertical strip,

291
00:19:19,802 --> 00:19:24,030
which is roughly centered in the middle of
the point set. So, here's what I mean. By

292
00:19:24,042 --> 00:19:27,915
center dot, we're going to look at the
middle x coordinate. So, let x bar be the

293
00:19:27,927 --> 00:19:32,020
biggest x coordinate in the left half, so
that is in the sorted version of the

294
00:19:32,032 --> 00:19:37,362
points by x coordinate, we look at the n
over two smallest ex-coordinate. So, in

295
00:19:37,374 --> 00:19:42,971
this example where we have six points, all
this means is we draw, we imagine drawing

296
00:19:42,983 --> 00:19:48,630
a line between the third points, so that's
going to be x bar, the x coordinate of the

297
00:19:48,642 --> 00:19:52,807
third point from the left. Now, since
we're passed as input, a copy of the

298
00:19:52,819 --> 00:19:56,304
points sorted by x coordinate, we can
figure out what x bar is in constant time.

299
00:19:56,389 --> 00:20:00,068
Just by accessing the relevant entry of
the array, px. Now, the way we're going to

300
00:20:00,080 --> 00:20:03,371
use this parameter delta that we're
passed, so remember what delta is. So

301
00:20:03,383 --> 00:20:07,033
before we invoke the closest split pair
subroutine in the recursive algorithm, we

302
00:20:07,045 --> 00:20:10,699
make our two recursive calls, we find the
closest pair on the left, the closest pair

303
00:20:10,711 --> 00:20:14,169
on the right, and delta is whatever the
smaller of those two distances are. So

304
00:20:14,181 --> 00:20:17,899
delta is the parameter that controls
whether or not we actually care about the

305
00:20:17,911 --> 00:20:22,037
closest split pair or not, we care if and
only if there is a split pair at distance

306
00:20:22,049 --> 00:20:25,944
less than delta. So, how do we use delta?
Well, that's going to determine the width

307
00:20:25,956 --> 00:20:29,514
of our strip, so the strip's going to have
width 2 delta, and it's going to be

308
00:20:29,526 --> 00:20:33,034
centered around x. And the first thing
we're going to do is we're going to

309
00:20:33,046 --> 00:20:36,859
ignore, forevermore, points which do not
line in this vertical strip.

310
00:20:36,873 --> 00:20:40,954
So the rest of the algorithm will operate
only on the subset of p, the subset of the

311
00:20:40,966 --> 00:20:44,752
points that lie on the strip, and we're
going to keep track of them sorted by y

312
00:20:44,764 --> 00:20:48,791
coordinate. So the formal way to say that
they line the strip, is that they have x

313
00:20:48,803 --> 00:20:53,199
coordinate in the interval with lower
endpoint x bar minus delta and upper

314
00:20:53,449 --> 00:20:57,896
endpoint x bar plus delta. Now, how long
does it take to construct this set Sy

315
00:20:58,063 --> 00:21:02,945
sorted by y coordinate? Well fortunately,
we've been passed as input a sorted

316
00:21:02,957 --> 00:21:07,693
version of the points Py So to extract Sy
from Py, all we need to do is a simple

317
00:21:07,705 --> 00:21:12,712
linear scan through p y checking for each
point where its x coordinate is. So this

318
00:21:12,724 --> 00:21:17,564
can be done in linear time. Now, I haven't
yet shown you why it's useful to have this

319
00:21:17,576 --> 00:21:22,226
sorted set as y, but if you take it on
faith that it's useful to have the points

320
00:21:22,238 --> 00:21:25,974
in this vertical strip sorted by y
coordinate. You now see why it was useful

321
00:21:25,986 --> 00:21:30,017
that we did this merge sort all the way at
the beginning of the algorithm before we

322
00:21:30,029 --> 00:21:34,093
even underwent any recurssion. Remember,
what is our running time goal for closest

323
00:21:34,105 --> 00:21:38,195
split pair? We want this to run in linear
time, that means we cannot sort inside the

324
00:21:38,207 --> 00:21:41,948
closest split pair subroutine. That would
take too long. We want this to be in

325
00:21:41,960 --> 00:21:45,909
linear time. Fortunately, since we sorted
once and for all at the beginning of the

326
00:21:45,921 --> 00:21:50,030
closest pair algorithm, extracting sorted
sublists from those sorted lists of points

327
00:21:50,042 --> 00:21:54,495
can be done, done in linear time, which is
within our goals here. Now, it's the rest

328
00:21:54,507 --> 00:21:58,285
of t he subroutine where you're never
going to believe me that it does anything

329
00:21:58,297 --> 00:22:02,340
useful. So, I claim that essentially with
a linear scan through Sy, we're going to

330
00:22:02,352 --> 00:22:06,645
be able to identify the closest split pair
of points in the interesting, unlucky case

331
00:22:06,657 --> 00:22:11,190
where there is such a split pair with
distance less than delta. So here's what I

332
00:22:11,202 --> 00:22:15,320
mean by that linear scan through Sy. So as
we do the scan, we're, we're going to keep

333
00:22:15,332 --> 00:22:19,240
track of the closest pair of points of a
particular type that we've seen so far.

334
00:22:19,332 --> 00:22:23,135
So, let me introduce some variables to
keep track of the best candidate we've

335
00:22:23,147 --> 00:22:27,080
seen so far. There's going to be a vary,
variable best which will initialize to be

336
00:22:27,092 --> 00:22:30,955
delta. Remember, we're uninterested in
split pairs unless they have distance

337
00:22:30,967 --> 00:22:35,695
strictly less than delta. So, and then
we're going to keep track of the points

338
00:22:35,707 --> 00:22:40,847
themselves, so we'll initialize the best
pair to be null. Now, here is the linear

339
00:22:40,859 --> 00:22:46,218
scan. So we go through the points of Sy in
order y coordinate. Okay, well, not quite

340
00:22:46,230 --> 00:22:51,085
all the points of Sy. We stop at the
eighth to last point and you'll see why in

341
00:22:51,097 --> 00:22:58,523
a second. And then, for each position I of
the array Sy, we investigate the seven

342
00:22:58,535 --> 00:23:06,351
subsequent points of the same array Sy. So
for j going from one to seven, we look at

343
00:23:06,363 --> 00:23:12,837
the Ith, and I plus jth entry of Sy. So if
sy looks something like this array here,

344
00:23:12,945 --> 00:23:18,090
in any given point in this double for
loop, we're generally looking at an index

345
00:23:18,102 --> 00:23:23,940
I, a point in this, in this of the array,
and then some really quite nearby point in

346
00:23:23,952 --> 00:23:28,622
the array I plus j, because j here's going
to be at most seven. Okay? So we're

347
00:23:28,634 --> 00:23:32,221
constantly looking at pairs in this array,
but we're not looking at all pairs of all.

348
00:23:32,304 --> 00:23:35,519
We're only looking at pairs that are very
close to each other, within seven

349
00:23:35,531 --> 00:23:39,034
positions of each other. And what do we do
for each choice of i and j? Well, we just

350
00:23:39,046 --> 00:23:42,357
look at those points, we compute the
distance, we see if it's better than all

351
00:23:42,369 --> 00:23:45,801
of the pairs of points of this form that
we've looked at in the past and if it is

352
00:23:45,813 --> 00:23:49,788
better, then we remember it. So we just
remember the best, ie c losest pair of

353
00:23:49,800 --> 00:23:53,660
points, of this particular type for
choices of i and j of this form. So in

354
00:23:53,672 --> 00:23:57,598
more detail, if the distance between the
current pair of points of p and q is

355
00:23:57,610 --> 00:24:01,959
better than the best we've seen so far, we
reset the best pair of points to be equal

356
00:24:01,971 --> 00:24:05,864
to p and q, and we reset the best
distance, the closest distance seemed so

357
00:24:05,876 --> 00:24:10,402
far to be the distance between p and q and
that's it. Then, once this double for loop

358
00:24:10,414 --> 00:24:14,610
terminates, we just return it the best
pair. So one possible execution of closest

359
00:24:14,622 --> 00:24:18,750
split pair is that it never finds a pair
of points, p and q, at distance less than

360
00:24:18,762 --> 00:24:22,862
delta. In that case, this is going to
return null and then in the outer call. In

361
00:24:22,874 --> 00:24:26,944
the closet pair, obviously, you interpret
a null pair of points to have an infinite

362
00:24:26,956 --> 00:24:30,736
distance. So if you call closest split
pair, and it doesn't return any points,

363
00:24:30,826 --> 00:24:34,952
then the interpretation is that there's no
interesting split pair of points and you

364
00:24:34,964 --> 00:24:39,106
just return the better of the results of
the two recursive calls p1Q1 or P2Q2. Now,

365
00:24:39,196 --> 00:24:43,305
as far as the running time of the
subroutine, what happens here? Well, we do

366
00:24:43,317 --> 00:24:47,751
constant work just initializing the
variables. Then notice that the number of

367
00:24:47,763 --> 00:24:51,959
points in Sy, well in the worst case, you
have all of the points of P. So, it's

368
00:24:51,971 --> 00:24:55,983
going to be the most endpoints, and so,
you do a linear number of iterations in

369
00:24:55,995 --> 00:24:59,737
the outer for loop. But here is the key
point, in the inner for loop, right,

370
00:24:59,833 --> 00:25:03,895
normally double for loops give rise to
quadratic running time, but in this inner

371
00:25:03,907 --> 00:25:07,701
for loop we only look at a constant number
of other positions. We only look at seven

372
00:25:07,713 --> 00:25:11,237
other positions and for each of those
seven positions, we only do a constant

373
00:25:11,249 --> 00:25:14,970
number of work. Right? We just, we want to
compare distance and make a couple other

374
00:25:14,982 --> 00:25:18,653
comparisons, and reset some variables. So
for each of the linear number of outer

375
00:25:18,665 --> 00:25:22,313
iterations, we do a constant amount of
work, so that gives us a running time of o

376
00:25:22,325 --> 00:25:26,753
of n for this part of the algorithm. So as
I promised, analyzing the running time of

377
00:25:26,765 --> 00:25:30,409
this closest split pair subroutine was not
challenging. We just , in a

378
00:25:30,421 --> 00:25:34,489
straightforward way, looked at all the
operations. Again, because in the key

379
00:25:34,501 --> 00:25:38,758
linear scan, we only do constant work per
index, the overall running time is big O

380
00:25:38,770 --> 00:25:43,027
of n, just as we wanted. So this does mean
that our overall recursive algorithm will

381
00:25:43,039 --> 00:25:46,703
have running time o of n log n. What is
totally not obvious and perhaps even

382
00:25:46,715 --> 00:25:51,005
unbelievable, is that this subroutine
satifies the correctness requirements that

383
00:25:51,017 --> 00:25:54,889
we wanted. Remember, what we needed, we
needed that whenever we're in the unlucky

384
00:25:54,901 --> 00:25:58,900
case, whenever, in fact, the closest pair
of points in the whole point set is split,

385
00:25:58,989 --> 00:26:03,812
this subroutine better find it. So, but it
does, and that's being precise in the

386
00:26:03,824 --> 00:26:08,122
following correctness claim. So let me
rephrase the claim in terms of an

387
00:26:08,134 --> 00:26:12,828
arbitrary split pair, which has distance
less than delta, not necessarily the

388
00:26:12,840 --> 00:26:17,354
closest such pair. So suppose, there
exists, a p on the left, a point on the

389
00:26:17,366 --> 00:26:22,398
left side and a point on the right side so
that is a split pair and suppose the

390
00:26:22,410 --> 00:26:28,319
distance of this pair is less than Q. Now,
there may or may not be such a pair of

391
00:26:28,331 --> 00:26:33,847
points, PQ.. Don't forget what this
parameter delta means. What delta is, by

392
00:26:33,859 --> 00:26:39,643
definition, is the minimum of d of p1q1,
for p1q1 is the closest pair of points

393
00:26:39,655 --> 00:26:45,734
that lie entirely in the left half of the
point set Q and d of p2q2, or similarly,

394
00:26:45,746 --> 00:26:50,199
p2Q2 is the closest pair of points that
entirely on the right inside of R. So, if

395
00:26:50,211 --> 00:26:54,600
there's a split pair with distance less
than delta, this is exactly the unlucky

396
00:26:54,612 --> 00:26:59,134
case of the algorithm. This is exactly
where neither recursive call successfully

397
00:26:59,146 --> 00:27:03,596
identifies the closest pair of points,
instead that closest pair is a split pair.

398
00:27:03,694 --> 00:27:07,991
On the other hand, if we are in the lucky
case, then there will not be any split

399
00:27:08,003 --> 00:27:11,981
pairs with distance less than delta,
because the closest pair lies either all

400
00:27:11,993 --> 00:27:16,208
on the left or on the right, and it's not
split. But remember, we're interested in

401
00:27:16,220 --> 00:27:20,740
the case where there is a split pair that
has a distance less than delta where there

402
00:27:20,752 --> 00:27:24,743
is a split pair that is the closest pair.
So the claim has two parts. The first

403
00:27:24,755 --> 00:27:28,908
part, part A, says the following. It says
that if there's a split pair p and, and q

404
00:27:28,920 --> 00:27:34,439
of this type, then p and q are members of
Sy. And let me just sort of redraw the

405
00:27:34,451 --> 00:27:40,681
cartoon. So remember what Sy is. Sy is
that vertical strip. And again, the way we

406
00:27:40,693 --> 00:27:46,587
got that is we drew a line through a
median x coordinate and then we fattened

407
00:27:46,599 --> 00:27:52,543
it by delta on either side, and then, we
focused only on points that lie in the

408
00:27:52,555 --> 00:27:57,962
vertical strip. Now, notice our counts
split pair subroutine, if it ever returns

409
00:27:57,974 --> 00:28:02,323
a pair of points, it's going to return a
pair of points pq that belong to Sy.

410
00:28:02,427 --> 00:28:07,188
First, it filters down to Sy, then it does
a linear search through Sy. So if we want

411
00:28:07,188 --> 00:28:12,300
to believe that our subroutine identifies
best split pairs of points, then, in

412
00:28:12,312 --> 00:28:16,720
particular, such split pairs of points
better show up in Sy, they better survive

413
00:28:16,732 --> 00:28:21,190
the filtering step. So that's precisely
what part A of the claim is. Here's part B

414
00:28:21,202 --> 00:28:25,155
of the claim and this is the more
remarkable part of the claim, which is

415
00:28:25,167 --> 00:28:29,615
that p and q are almost next to each other
in this sorted array, Sy. So they're not

416
00:28:29,627 --> 00:28:34,055
necessarily adjacent, but they're very
close, they're within seven positions away

417
00:28:34,067 --> 00:28:38,190
from each other. So, this is really the
remarkable part of the algorithm. This is

418
00:28:38,202 --> 00:28:42,600
really what's surprising and what makes
the whole algorithm work. So, just to make

419
00:28:42,612 --> 00:28:46,575
sure that we're all clear on everything,
let's show that if we prove this claim,

420
00:28:46,667 --> 00:28:50,250
then we're done, then we have a correct
fast implementation of a closest pair

421
00:28:50,262 --> 00:28:54,360
algorithm. I certainly owe you the proof
of the claim, that's what the next video

422
00:28:54,372 --> 00:28:58,395
is going to be all about, but let's show
that if the claim is true, then, we're

423
00:28:58,668 --> 00:29:02,941
home-free. So if this claim is true, then
so is the following corollary, which I'll

424
00:29:02,953 --> 00:29:07,235
call corollaryl 1. So corollary 1 says, if
we're in the unlucky case that we

425
00:29:07,247 --> 00:29:11,386
discussed earlier, if we're in the case
where the closest point and the whole

426
00:29:11,398 --> 00:29:15,864
points of p does not lie both on the left,
does not lie both on the right, but rather

427
00:29:15,876 --> 00:29:20,112
has one point on the left and one on the
right but as it's a split pair, th en in

428
00:29:20,124 --> 00:29:25,162
fact, the count split pair subroutine will
correctly identify the closest split pair

429
00:29:24,901 --> 00:29:29,468
and therefore the closest pair overall.
Why is this true? Well what does count

430
00:29:29,480 --> 00:29:34,270
split pair do? Okay, so it has this double
for loop, and thereby, explicitly examines

431
00:29:34,282 --> 00:29:38,415
a bunch of pairs of points and it
remembers the closest pair of all of the

432
00:29:38,427 --> 00:29:42,870
pairs of points that it examines. What
does this, so what are the criteria that

433
00:29:42,882 --> 00:29:48,061
are necessary for count split pair to
examine a pair point? Well, first of all,

434
00:29:48,161 --> 00:29:52,631
the points p and q both have to survive
the filtering step and make it into the

435
00:29:52,643 --> 00:29:57,390
array Sy. Right? So count split pair only
searches over the array Sy. Secondly, it

436
00:29:57,402 --> 00:30:01,948
only searches over pairs of points that
are almost adjacent in Sy, that are only

437
00:30:01,960 --> 00:30:06,253
seven positions apart, but amongst pairs
of points that satisfy those two criteria,

438
00:30:06,346 --> 00:30:10,293
counts but pair will certainly compute the
closest such pair, right? It just

439
00:30:10,305 --> 00:30:14,196
explicitly remembers the best of them.
Now, what's the content of the claim?

440
00:30:14,289 --> 00:30:18,466
Well, the claim is guaranteeing that every
potentially interesting split pair of

441
00:30:18,478 --> 00:30:22,960
points and every split pair of points with
distance less than delta meets both of the

442
00:30:22,972 --> 00:30:26,403
criteria which are necessary to be
examined by the count split pair

443
00:30:26,403 --> 00:30:30,461
subroutine. So first of all, and this is
the content of part A, if you have an

444
00:30:30,473 --> 00:30:34,507
interesting split pair of points with
distance less than delta, then they'll

445
00:30:34,519 --> 00:30:38,920
both survive the filtering step. They'll
both make it into the array Sy., part A

446
00:30:38,932 --> 00:30:42,651
says that. Part B says they're almost
adjacent in Sy. So if you have an

447
00:30:42,663 --> 00:30:47,169
interesting split pair of points, meaning
it has distance less than delta, then they

448
00:30:47,181 --> 00:30:51,563
will, in fact, be at most seven positions
apart. Therefore, count split pair will

449
00:30:51,575 --> 00:30:55,849
examine all such split pairs, all split
pairs with distance less than delta, and

450
00:30:55,861 --> 00:31:00,268
just by construction, it will compute the
closest pair of all of them. So again, in

451
00:31:00,280 --> 00:31:04,646
the unlucky case where the best pair of
points is a split pair, then this claim

452
00:31:04,658 --> 00:31:08,619
guarantees that the count split pair will
compute the closest pair of points.

453
00:31:08,619 --> 00:31:12,984
Therefore, having h andled correctness, we
can just combine that with our earlier

454
00:31:12,996 --> 00:31:17,146
observations about running time and
corollary 2 just says, if we can prove the

455
00:31:17,158 --> 00:31:20,845
claim, then we have everything we wanted.
We have a correct O of n log n

456
00:31:20,857 --> 00:31:24,950
implementation for the closest pair of
points. So with further work and a lot

457
00:31:24,962 --> 00:31:29,062
more ingenuity, we've replicated the
guarantee that we got just by sorting for

458
00:31:29,074 --> 00:31:33,516
the one-dimensional case. Now again, these
corrollaries hold only if this claim is,

459
00:31:33,606 --> 00:31:37,567
in fact, true and I have given you no
justification for this claim. And even the

460
00:31:37,579 --> 00:31:41,300
statement of the claim, I think, is a
little bit shocking. So if I were you I

461
00:31:41,312 --> 00:31:45,112
would demand an explanation for why this
claim is true, and that's what I'm going

462
00:31:45,112 --> 00:31:46,514
to give you in the next video.