1 00:00:00,000 --> 00:00:03,974 Alright. So the plan for this video is to prove the correctness of the divide and 2 00:00:03,974 --> 00:00:07,997 conquer closest to pair algorithm that we discussed in the previous video. So just 3 00:00:07,997 --> 00:00:12,167 to refresh your memory, how does the outer algorithm work? Well, we're given endpoints 4 00:00:12,167 --> 00:00:15,699 in the plane. We begin by sorting them, first by x-coordinate and then by 5 00:00:15,699 --> 00:00:19,526 y-coordinate. That takes n log in time. Then we enter the main recursive divide 6 00:00:19,526 --> 00:00:23,539 and conquer part of the algorithm. So what do we do? We divide the point set into the 7 00:00:23,539 --> 00:00:27,621 left half and the right half, Q and R, then we conquer. We recursively compute 8 00:00:27,621 --> 00:00:31,546 the closest pair in the left half of the point set Q. We recursively compute the 9 00:00:31,546 --> 00:00:35,311 closest pair in the right half of the point set R. There is a lucky case where 10 00:00:35,311 --> 00:00:39,199 the closest pair on the entire point set lies either all on the left or all on the 11 00:00:39,199 --> 00:00:42,572 right. In that case, the closest pair is handed to us on a silver platter, 12 00:00:42,572 --> 00:00:46,366 by one of the two recursive calls. But there remains the unlucky case where the closest 13 00:00:46,366 --> 00:00:50,254 pair is actually split with one point on the left and one point on the right. So to 14 00:00:50,254 --> 00:00:54,002 get our N log N running time bound, analogous to Merge Short in our inversion 15 00:00:54,002 --> 00:00:57,562 counting, we need to have a linear time implementation of a subroutine which 16 00:00:57,562 --> 00:01:01,122 computes the best, the closest pair of points, which is split, one on the left 17 00:01:01,122 --> 00:01:04,729 and one on the right. Well, actually, we don't need to do quite that. We need to 18 00:01:04,729 --> 00:01:08,870 do something only a little bit weaker. We need a linear time algorithm, which whenever 19 00:01:08,870 --> 00:01:15,196 the closest pair in the whole point set is in fact split, then computes that split 20 00:01:15,196 --> 00:01:19,527 pair in linear time. So let me now remind you of how that subroutine works. 21 00:01:19,527 --> 00:01:23,397 So it has two basic steps. So first, there's a filtering step. So it looks at, 22 00:01:23,551 --> 00:01:27,576 first of all, a vertical strip, roughly down the middle of the point set. And it 23 00:01:27,576 --> 00:01:31,540 looks at, only at points which fall into that vertical strip. That was a subset of 24 00:01:31,540 --> 00:01:34,783 the points that we called S sub Y, 'cause we keep track of them sorted by Y 25 00:01:34,783 --> 00:01:38,711 coordinate. And then we do essentially a linear scan through SY. So we go through 26 00:01:38,711 --> 00:01:42,601 the points one at a time, and then, for each point, we look at only the almost 27 00:01:42,601 --> 00:01:47,749 adjacent points. So for each index I, we look only at J's that are between one and 28 00:01:47,749 --> 00:01:52,122 seven positions further to the right, than I. So among all such points, we 29 00:01:52,122 --> 00:01:56,266 compare them, we look at their distances. We remember the best such pair of points. 30 00:01:56,266 --> 00:02:00,206 And then that's what we return from the count split pair subroutine. So we've 31 00:02:00,206 --> 00:02:03,232 already argued, in the previous video, that the overall running time of the 32 00:02:03,232 --> 00:02:06,408 algorithm is N log N. And what remains to prove correctness. And we also argued, in 33 00:02:06,408 --> 00:02:11,126 the previous video, that correctness boils down to the following correctness claim. 34 00:02:11,126 --> 00:02:16,066 In the sense that, if we can prove this claim, then the entire algorithm is 35 00:02:16,066 --> 00:02:19,905 correct. So this is what remains. Our residual work is to provide a proof of 36 00:02:19,905 --> 00:02:24,000 the correctness claim. What does it say? It says consider any split pair that is 37 00:02:24,000 --> 00:02:28,198 one point p from the left side Q, capital Q, and another point little q drawn from 38 00:02:28,198 --> 00:02:32,292 the right side of the point set capital R. And fur, further suppose that it's an 39 00:02:32,292 --> 00:02:36,335 interesting split pair meaning that the distance between them's at most delta. 40 00:02:36,335 --> 00:02:40,689 Here delta is recall the parameter pass to the count split pair subroutine, which is 41 00:02:40,689 --> 00:02:44,494 the smallest distance between a pair of points all on the left or all on the 42 00:02:44,494 --> 00:02:48,485 right. And this is the only case we're interested in. There's two claims. First 43 00:02:48,485 --> 00:02:52,661 of all, for p and q, both members of the split pair survive the filtering step. 44 00:02:52,661 --> 00:02:57,570 They make it into the sorted list S sub Y, and second of all, they will be considered by 45 00:02:57,570 --> 00:03:01,328 that double for loop, in the sense that the positions of p and q in this array, S 46 00:03:01,328 --> 00:03:07,001 sub Y, differ by at most seven. So that's the story so far. Let's move on to the 47 00:03:07,001 --> 00:03:13,050 proof. So let's start with part A which is the easy part relatively of the claim. So 48 00:03:13,050 --> 00:03:17,175 remember what we start with, our assumptions. We have a point p, let's 49 00:03:17,175 --> 00:03:21,482 write it out in terms of the X coordinates, X1 and Y1, which is from the 50 00:03:21,482 --> 00:03:26,900 left half of the point set. And we have a point q, which we'll call X2Y2, which 51 00:03:26,900 --> 00:03:30,914 comes from the right half of the point set. And furthermore, we're assuming that 52 00:03:30,914 --> 00:03:35,081 these points are close to each other. And we're gonna use that hypothesis over and 53 00:03:35,081 --> 00:03:38,841 over again. So the Euclidean distance between p and q is no more than this 54 00:03:38,841 --> 00:03:42,754 parameter delta. So, first, something very simple, which is that if you have two 55 00:03:42,754 --> 00:03:46,565 points that are close in Euclidean distance, then both of their coordinates 56 00:03:46,565 --> 00:03:50,681 have to be close to each other, right? If you have two points, and they differ by a 57 00:03:50,681 --> 00:03:54,796 lot in one coordinate, then the Euclidean distance is gonna be pretty big as well. 58 00:03:54,796 --> 00:03:59,182 So, specifically. By our hypothesis, that p and q have Euclidean distance less than 59 00:03:59,182 --> 00:04:03,369 delta, it must be that the difference between their coordinates in absolute 60 00:04:03,369 --> 00:04:08,002 value is no more than delta, and as well, the difference between their y-coordinates 61 00:04:08,002 --> 00:04:12,132 is at most delta. Okay, and this is easy to see if you'd just return to the 62 00:04:12,132 --> 00:04:16,263 definition of Euclidean distance that we reviewed at the beginning of the 63 00:04:16,263 --> 00:04:20,340 discussion of closest points. Okay? So if your distance is at most delta, then in 64 00:04:20,340 --> 00:04:24,860 each coordinate, you differ by at most delta as well. Now, what does A say? 65 00:04:24,860 --> 00:04:29,763 [sound]. So proof of A. So what does part A of the claim assert? It asserts that p 66 00:04:29,763 --> 00:04:34,728 and q are both members of SY, are both members of that vertical strip. So another 67 00:04:34,728 --> 00:04:38,657 way of saying that is that the X coordinates of p and q, that is, the 68 00:04:38,657 --> 00:04:43,500 numbers X1 and X2 both are within delta of Xbar. Remember, Xbar was in some sense 69 00:04:43,500 --> 00:04:48,507 the median X coordinate. So the X coordinate of the N over two'th leftmost 70 00:04:48,507 --> 00:04:52,856 point. So we're gonna do a proof by picture, so consider, forget about the Y 71 00:04:52,856 --> 00:04:56,690 coordinates, that's of irrelevant right now, and just focus on the X coordinates 72 00:04:56,690 --> 00:05:02,501 of all of these points. So on the one hand we have X bar. This is the X coordinate of 73 00:05:02,501 --> 00:05:07,477 the N over two'th point to the left. And then there are the X coordinates which define 74 00:05:07,477 --> 00:05:12,445 the left and the right borders of that vertical strip. Namely Xbar-delta and 75 00:05:12,445 --> 00:05:17,323 Xbar+delta. And then somewhere in here are X1 and Y1, the X coordinates of the points 76 00:05:17,323 --> 00:05:21,951 we care about, p and q. So a simple observation, so because p comes from the 77 00:05:21,951 --> 00:05:26,830 left half of the point set, and Xbar is the rightmost X coordinate of the left 78 00:05:26,830 --> 00:05:31,641 half of the point set, the X coordinate is at most Xbar. Right? So all points of Q 79 00:05:31,641 --> 00:05:35,647 have X coordinate, at most, Xbar, in particular, p does. Similarly, since Xbar 80 00:05:35,647 --> 00:05:40,256 is the rightmost edge of the left half of the point set, everything in the right half 81 00:05:40,256 --> 00:05:44,316 of the point set has X coordinate, at least Xbar. So in particular, little q 82 00:05:44,316 --> 00:05:49,059 does as well. So what does this mean. So this means x1, wherever it is, has to be 83 00:05:49,059 --> 00:05:53,741 at the left of x bar. X2 wherever it is has to be to the right of x bar. What 84 00:05:53,741 --> 00:05:58,600 we're trying to prove is that they're wedged in between x bar minus delta and x bar 85 00:05:58,600 --> 00:06:02,980 plus delta. And the reason why that's true is because their x coordinates 86 00:06:02,980 --> 00:06:07,463 also differ by at most delta. Okay, so what you should imagine is. You can imagine x1 87 00:06:07,463 --> 00:06:11,921 and x2 are sort of people tied by a rope at the waist. And this rope has 88 00:06:11,921 --> 00:06:16,158 length delta. So wherever x1 and x2 move, they're at most delta apart. 89 00:06:16,158 --> 00:06:20,374 Furthermore x1, we just observed, can't move any farther to the right than Xbar. 90 00:06:20,374 --> 00:06:25,025 So even if X1 moves as far to the right as it can, all the way to Xbar, X2, since 91 00:06:25,025 --> 00:06:29,385 it's at most delta away, tied by the waist, can't extend beyond X bar+ delta. By 92 00:06:29,385 --> 00:06:33,579 the same reasoning, X2 can't move any further to the left than Xbar, X1 being 93 00:06:33,579 --> 00:06:38,104 tied to the waist to X2, can never drift further to the left than Xbar minus delta. 94 00:06:38,104 --> 00:06:42,519 So that's the proof that X1 and X2 both lie within this region, and that defines 95 00:06:42,519 --> 00:06:47,140 the vertical strip. So that's part A. If you have any split pair whose distance between 96 00:06:47,140 --> 00:06:52,134 them is less than delta, they both have to wind up, in this vertical strip. And 97 00:06:52,134 --> 00:06:57,690 therefore wind up in the filtered set, the proof set, S sub Y. So that's part A of 98 00:06:57,690 --> 00:07:02,718 the claim. Let's now move to Part B. Recall what part B asserts. It says that 99 00:07:02,718 --> 00:07:07,170 the points p and q, this split pair that are distance only delta apart. Not only do 100 00:07:07,170 --> 00:07:11,677 they wind up in this sort of filtered set SY, but in fact, they are almost adjacent 101 00:07:11,677 --> 00:07:15,690 in SY, in the sense that the indices in the array differ by, at most, seven 102 00:07:15,690 --> 00:07:19,474 positions. And this is a part of the claim that is a little bit shocking. Really what 103 00:07:19,474 --> 00:07:23,081 this says is that we're getting away with more or less a variant of our one 104 00:07:23,081 --> 00:07:27,300 dimensional algorithm. Remember when we wanted to find the closest pair of points 105 00:07:27,300 --> 00:07:31,415 on the line, all we had to do was sort them by their single coordinate and then 106 00:07:31,415 --> 00:07:35,538 look at consecutive pairs and return the best of those consecutive pairs. Here what 107 00:07:35,538 --> 00:07:39,109 we're saying is really, once we do a suitable filtering focus on points in this 108 00:07:39,109 --> 00:07:43,499 vertical strip, then we just go through the points according to their Y 109 00:07:43,499 --> 00:07:47,103 coordinate. And okay, we don't just look at adjacent pairs. We look at pairs within 110 00:07:47,103 --> 00:07:51,990 seven positions, but still we basically do a linear sweep through the points in SY. 111 00:07:51,990 --> 00:07:56,434 According to their Y coordinate and that's sufficient to identify the closest split 112 00:07:56,434 --> 00:08:00,917 pair. So why on earth will this be true. So our workhorse in this argument will be 113 00:08:00,917 --> 00:08:05,059 a picture which I am going to draw on next. So I'm going to draw eight boxes, 114 00:08:05,059 --> 00:08:09,406 which have a height and width delta over two. So here, delta is the same parameter 115 00:08:09,406 --> 00:08:13,595 that gets passed to the closest split pair subroutine. And it's also the same 116 00:08:13,595 --> 00:08:17,518 delta which we're assuming p and q are closer to each other than, right? So 117 00:08:17,518 --> 00:08:21,335 that's, remember, that's one of our hypotheses in this claim. The distance 118 00:08:21,335 --> 00:08:25,364 between p and q is strictly less than delta. So we're gonna draw eight delta 119 00:08:25,364 --> 00:08:31,476 over two boxes. And they're gonna be centered at x bar. So, this same center of 120 00:08:31,476 --> 00:08:37,862 the vertical strip that defines S Y. And the bottom is going to be the smaller of the 121 00:08:37,862 --> 00:08:42,627 Y-coordinates of the points p and q. So it might be p, it might be q. It 122 00:08:42,627 --> 00:08:47,995 doesn't really matter. But the bottom is going to be the smaller of the two. So 123 00:08:47,995 --> 00:08:53,219 the picture then looks as follows. So the center of these collections of eight 124 00:08:53,219 --> 00:09:01,125 boxes, X bar, the bottom is the minimum of Y1, Y2. We're gonna have two rows and four 125 00:09:01,125 --> 00:09:07,158 columns. And needless to say, we're drawing this picture just for the sake of 126 00:09:07,158 --> 00:09:10,726 this correctness proof, right? This picture is just a thought experiment in 127 00:09:10,726 --> 00:09:14,402 our head. We're just trying to understand why the algorithm works. The algorithm, of 128 00:09:14,402 --> 00:09:18,102 course, does not draw these boxes. The subroutine, the, closest split pair 129 00:09:18,102 --> 00:09:22,007 subroutine is just that pseudo code we saw in the previous video. This is just to 130 00:09:22,007 --> 00:09:25,527 reason about the behavior of that subroutine. Now looking ahead, I'll make 131 00:09:25,527 --> 00:09:29,480 this precise in two lemmas that are about to come up, what's going to be true 132 00:09:29,480 --> 00:09:33,987 is the following. So, either p or q is on this bottom line, right? So we define the 133 00:09:33,987 --> 00:09:38,435 bottom to be the lower Y coordinate of the two. So maybe, for example, q is the one 134 00:09:38,435 --> 00:09:42,766 that has the smaller Y coordinate, in which case, is gonna be, somewhere, say, 135 00:09:42,766 --> 00:09:47,389 down here. P, you remember, is from the left half of the point sets. So p is maybe 136 00:09:47,389 --> 00:09:51,078 gonna be here or something. And we're gonna argue that both p and q have to be 137 00:09:51,078 --> 00:09:54,506 in these boxes. Moreover, we're gonna argue that these boxes are sparsely 138 00:09:54,506 --> 00:10:00,940 populated. Every one contains either zero or one point of the array S sub Y. So, 139 00:10:00,940 --> 00:10:04,738 what we're gonna see is that there's at most eight points in this picture, two of 140 00:10:04,738 --> 00:10:08,113 which are p and q, and therefore, if you look at these points sorted by Y 141 00:10:08,113 --> 00:10:11,817 coordinate, it has to be that they're within seven of each other, the difference 142 00:10:11,817 --> 00:10:17,460 of indices is no more than seven. So, we're gonna make those two statements 143 00:10:17,460 --> 00:10:23,835 precise one at a time by the following two lemmas. Let's start with lemma one. Lemma 144 00:10:23,835 --> 00:10:28,412 one is the easy one. And it states that all of the points of S sub Y, which show up 145 00:10:28,412 --> 00:10:33,102 in between the Y coordinates of the points we care about p and q have to appear in 146 00:10:33,102 --> 00:10:38,952 this picture, they have to lie in one of these eight boxes. So we're going to argue 147 00:10:38,952 --> 00:10:42,910 this in two steps. First, we're going to argue that all such points have to have Y 148 00:10:42,910 --> 00:10:47,014 coordinates within the relevant range of this picture between the minimum of Y1 and 149 00:10:47,014 --> 00:10:50,971 Y2 and delta more than that, and secondly that they have to have X coordinates in 150 00:10:50,971 --> 00:10:57,525 the range of this picture, namely between X bar minus delta and X bar plus delta. So 151 00:10:57,525 --> 00:11:01,247 let's start with Y coordinates. So again, remember this key hypothesis we have, 152 00:11:01,247 --> 00:11:05,163 okay. We're dealing with a split pair p-q that are close to each other. The distance 153 00:11:05,163 --> 00:11:08,934 between X and Y is strictly less than delta. So the very first thing we did at 154 00:11:08,934 --> 00:11:12,801 the beginning of this proof is we said well, if their Euclidean distance is less 155 00:11:12,801 --> 00:11:16,766 than delta then they have to differ by at most delta in both of their coordinates, 156 00:11:16,766 --> 00:11:23,294 in particular in their Y coordinate. Now remember whichever is lower of p and 157 00:11:23,294 --> 00:11:27,736 q, whichever one has a smaller y coordinate is precisely at the bottom of 158 00:11:27,736 --> 00:11:32,785 this diagram. For example, if q is the one with the smaller y coordinate, it might be 159 00:11:32,785 --> 00:11:37,732 on the black line right here. So that means in particular x has y coordinate no 160 00:11:37,732 --> 00:11:41,973 more than the top part of this diagram. No more than delta bigger than q. And of 161 00:11:41,973 --> 00:11:46,376 course all points with Y coordinates in between them are equally well wedged into 162 00:11:46,376 --> 00:11:50,886 this picture. So that's why all points of SY with a Y coordinate between those of p 163 00:11:50,886 --> 00:11:54,913 and q have to be in the range of this picture, between the minimum of the two Y 164 00:11:54,913 --> 00:11:59,315 coordinates and delta more than that. Now what about horizontally? What about the X 165 00:11:59,315 --> 00:12:05,438 coordinates? Well this just follows from the definition of S sub Y. So remember, 166 00:12:05,438 --> 00:12:08,775 S sub Y are the points that fall into this vertical strip. How did we define the 167 00:12:08,775 --> 00:12:12,486 vertical strip? Well it had center Xbar, and then we fattened it by delta on both 168 00:12:12,486 --> 00:12:17,402 sides. So just by definition, if you're an SY, you've gotta have X coordinates in the 169 00:12:17,402 --> 00:12:23,882 range of this picture. X delta plus minus, sorry, xbar plus minus delta. So that 170 00:12:23,882 --> 00:12:27,607 completes the proof of the lemma. So this is not. This is just a lemma. So I'll use a 171 00:12:27,607 --> 00:12:31,476 lower case qed. Remember this is just a step toward proving the overall 172 00:12:31,476 --> 00:12:35,106 correctness claim. But this is a good step. And again, the way you think about 173 00:12:35,106 --> 00:12:38,880 this is it says we draw this boxes. We know that either p or q is at the bottom. 174 00:12:38,880 --> 00:12:43,144 The other one is going to be on the other side of the black line x bar but will be 175 00:12:43,144 --> 00:12:47,356 in some other box so perhaps maybe p is here and the lemma is saying that all the 176 00:12:47,356 --> 00:12:51,415 relevant points of SY have to be somewhere in this picture. Now remember in our 177 00:12:51,415 --> 00:12:55,319 double for loop we only search seven positions away, so the concern is that 178 00:12:55,319 --> 00:12:59,378 this is a sorta super highly populated collection of eight boxes. That's the 179 00:12:59,378 --> 00:13:03,539 concern, but that's not going to be the case and that's exactly what lemma two is 180 00:13:03,539 --> 00:13:07,694 going to say. Not only do the points between p and q in Y coordinates show up 181 00:13:07,694 --> 00:13:11,965 in this diagram, but there have to be very few. In particular, every box has to be 182 00:13:11,965 --> 00:13:18,810 sparse, with population either zero or one. So, let's move on to lemma two. So formally 183 00:13:18,810 --> 00:13:24,776 the claim is [sound], we have at most one point of the point set in each of these 184 00:13:24,776 --> 00:13:30,572 eight boxes. And this is finally where we use, in a real way, the definition of 185 00:13:30,572 --> 00:13:35,697 delta. This is where we finally get the payoff from our realization long ago, that 186 00:13:35,697 --> 00:13:40,379 when defining the closest split pair subroutine, we only really need to be 187 00:13:40,379 --> 00:13:45,504 correct in the unlucky case. In the case we're not handed the right answer by one 188 00:13:45,504 --> 00:13:50,629 of our recursive calls. We're finally gonna use that fact in a fundamental way. 189 00:13:50,629 --> 00:13:56,756 So we're gonna proceed by contradiction. So we're going to think about what happens 190 00:13:56,756 --> 00:14:01,159 if there are two points in a single box and from that we'll be able to derive a 191 00:14:01,159 --> 00:14:08,202 contradiction. So, call the points that wind up in the same box A and B. So, to 192 00:14:08,202 --> 00:14:16,168 the contrary, suppose A and B lie in the same box. So, maybe this is A here, and 193 00:14:16,168 --> 00:14:23,627 this is B here, at antipodal corners of this particular box. So from this 194 00:14:23,627 --> 00:14:31,125 supposition, we have two consequences. First of all. I claim that A and B lie on 195 00:14:31,125 --> 00:14:36,057 the same side of the point set. They're either both in the left side, Q or they're 196 00:14:36,057 --> 00:14:41,595 both in the right side, R. So why is this true? Well it's because every box lies either 197 00:14:41,595 --> 00:14:46,116 entirely on the left half of the point set or on the right half of the point 198 00:14:46,116 --> 00:14:51,375 set. Recall how we define x bar. X bar is the x coordinate of the right most point 199 00:14:51,375 --> 00:14:56,693 amongst the left half of the point set capital Q. So therefore points with x 200 00:14:56,693 --> 00:15:01,354 coordinate at most x bar have to lie inside the left half Q. Points with x 201 00:15:01,354 --> 00:15:05,574 coordinates at least x bar have to lie inside the right half of the point 202 00:15:05,574 --> 00:15:09,513 set capital R. So that would be like in this example. A and b both lie in a box 203 00:15:09,513 --> 00:15:13,238 which is to the right of x bar. So they both have to come in the right half 204 00:15:13,238 --> 00:15:17,490 of the point set capital R. This is one place we are using that there are no ties 205 00:15:17,490 --> 00:15:21,375 in X coordinate, so if there's a point with X, X coordinate or X bar, we can 206 00:15:21,375 --> 00:15:25,491 count it as part of the left half. So every box, by virtue of being either to 207 00:15:25,491 --> 00:15:30,155 the left of xbar or to the right of xbar, can only contain points from a common half 208 00:15:30,155 --> 00:15:34,064 of the point set. So that's the first consequence of assuming that you have two 209 00:15:34,064 --> 00:15:37,468 points in the same box. The second consequence is, because the boxes are 210 00:15:37,468 --> 00:15:43,904 small, the points gotta be close. So, if A and B co-habitate a box, how far could 211 00:15:43,904 --> 00:15:47,651 they be from each other? Well, the farthest they could be is like I've drawn 212 00:15:47,651 --> 00:15:50,667 in the picture, with the points A and B. Where they're at opposite corners of a 213 00:15:50,667 --> 00:15:54,164 common box. And then you bust out Pythagorean's Theorem, and what do you 214 00:15:54,164 --> 00:15:58,661 get? You get that the distance between them is delta over two, the side of the 215 00:15:58,661 --> 00:16:03,790 box times root two. And what's relevant for us is this is strictly less than 216 00:16:03,790 --> 00:16:10,300 delta. Okay? But, now, here is where we use, finally, the definition of delta. 217 00:16:10,300 --> 00:16:14,983 Consequences one and two in tandem, contradict how we define delta. Remember 218 00:16:14,983 --> 00:16:19,953 what delta is. It's as close as two pair of, a pair of points can get if they both 219 00:16:19,953 --> 00:16:24,044 lie on the left side of the point set, or if they both lie on the right side of the 220 00:16:24,044 --> 00:16:27,313 point set. That is how we defined it. As small as a pair of points on a common half 221 00:16:27,313 --> 00:16:31,553 can get to each other. But what have we just done? We've exhibited a pair A and B 222 00:16:31,553 --> 00:16:35,687 that lie on the same half of the point set, and are strictly closer than delta. 223 00:16:35,687 --> 00:16:40,335 So that contradicts the definition of delta. So that completes the proof of lemma 224 00:16:40,335 --> 00:16:44,196 two. Let me just make sure we're all clear on why having proved lemma one and lemma two 225 00:16:44,196 --> 00:16:47,875 we're done with the proof part B of the claim and therefore the entire claim 226 00:16:47,875 --> 00:16:51,556 because we already proved part one, now a long time ago. 227 00:16:51,556 --> 00:16:56,558 So let's interpret the 2 lemmas in the context of our picture that we had all 228 00:16:56,558 --> 00:17:00,227 throughout. In terms of the eight boxes of side length delta over two by 229 00:17:00,227 --> 00:17:05,680 delta over two. So again, whichever is the lower of p and q, and again let's just for 230 00:17:05,680 --> 00:17:10,340 the sake of concreteness say it's q, is at the bottom of the picture. The other point 231 00:17:10,340 --> 00:17:15,053 is on the other half of the line Xbar, and is in one of the other boxes. So, for 232 00:17:15,053 --> 00:17:20,323 example, maybe p is right here. So lemma one says that every relevant point, every 233 00:17:20,323 --> 00:17:25,188 point that survives the filtering and makes it into SY, by virtue of being in 234 00:17:25,188 --> 00:17:29,856 the vertical strip, has to be in one of those boxes, okay? If it has Y coordinate 235 00:17:29,856 --> 00:17:36,102 in between p and q. Lemma two says that you can only have one point in each of these boxes 236 00:17:36,102 --> 00:17:40,821 from the point set, so that's gonna be at most eight total. [sound] So combining 237 00:17:40,821 --> 00:17:48,436 them. Lemmas one and two imply, that there are almost eight points in this picture 238 00:17:48,436 --> 00:17:55,638 and that includes p and q because they also occupy two of eight boxes. So in the 239 00:17:55,638 --> 00:18:00,390 worst case, if this is as densely populated as could possibly be, given 240 00:18:00,390 --> 00:18:05,576 lemmas one and two, every other box might have a point and perhaps every one of 241 00:18:05,576 --> 00:18:10,042 those points has a Y coordinate between p and q. But this is as bad as it gets. 242 00:18:10,042 --> 00:18:14,655 Any point of the strip with Y coordinate between p and q occupies a box. 243 00:18:14,655 --> 00:18:19,087 So, at most, there are these six wedged in between them. What does this mean? This 244 00:18:19,087 --> 00:18:22,703 means if from q you look seven positions ahead in the array, you are 245 00:18:22,703 --> 00:18:27,903 guaranteed to find this point p. So a split pair with distance less than delta 246 00:18:27,903 --> 00:18:33,242 is guaranteed to be identified by our double for loop. Looking seven positions 247 00:18:33,242 --> 00:18:39,288 ahead in the sorted array SY is sufficient to identify, to look at every conceivably 248 00:18:39,288 --> 00:18:45,442 interesting split pair. So that completes the assertion B of the correctness 249 00:18:45,442 --> 00:18:49,461 claim and we're done. That establishes that this supremely clever 250 00:18:49,461 --> 00:18:54,672 divide and conquer algorithm is indeed a correct O(nlog(n)) algorithm that computes the 251 00:18:54,672 --> 00:18:58,000 closest pair of a set of n points in the plane.