This video will prove the correctness of our greedy algorithm for clustering. We'll show that it maximizes the spacing over all possible K clusterings. You might have hoped that we could deduce the correctness of this greedy algorithm for clustering immediately from our correctness proofs for various greedy minimum spanning tree algorithms. Unfortunately that doesn't seem to be the case. In the minimum cost spanning tree problem, we're focusing on minimizing the sum of the edge cost. Here we're looking at different objective, maximizing the spacings. We do need to do a proof from scratch. That's said, you know, the arguments we'll use should look familiar to you not just from the sort of exchange type arguments when we prove the cut property, but also it might remind you even more, going back further, to our greedy algorithms for scheduling. So let's now set up the notation for the proof. As usual, we're going to look at the output of r algorithm. It achieves some objective function value, some spacing. We're going to look at an arbitrary competitor. Some other proposed scheduling. We're going to show that we're at least as good, our spacing is, at least, as large. So specifically, we'll denote the clusters in the output of r algorithm by C1 up to CK. Our clustering has some spacing, some distance between the near, closest pair of separated points. Call it capital S. We're going to denote our competitor, some alternative K clustering by C-Hat one of the C-Hat K, what is it that we're tryin to show? We want to show that this arbitrary other clustering has spacing no larger than R's, if we can show that, then because this clustering was arbitrary it means the greedy clustering has spacing as large as any other, so it's maximizing the spacing, that's what we want to proof. But differently we want to exhibit a pair of points separated by this cluster and C one-half to C1K, such that the distance between those separated points is S or smaller. So, let me just quickly depose of a trivial case. If the C hats are the same as the C's, possibly up to a renaming, then of course exactly the same pairs of points are separated into each of the clustering, so that the spacing is exactly the same. So that's not a case we have to worry about. The interesting case, then, is when the c hats differ fundamentally from the cs, when they're not merely a permutation of the clusters in the greedy clustering. And the maneuver we're going to do here is similar in spirit to what we did in our scheduling correctness proof. Way back in our scheduling correctness proof, we argued that any schedule that differs from the greedy one, suffers from, in some sense, a local flaw. We identified an adjacent pair of jobs that was, in some sense, out of order with respect to the greedy ordering. The analog here is, we're going to argue that, for any clustering which is not merely a permutation of the greedy clustering. There has to be a pair of points which is classified differently in the c hats relative to the c's. By differently, I mean they're clustered together in the greedy clustering. These points, p and q, belong to the same cluster, c sub i. Yet, in this alternative clustering, which is not just the permutation of the greedy clustering. They're placed in different clusters. One, maybe p and c hat i, and q and some other c hat j. So I want to now split the proof into an easy case and a tricky case. To explain why the easy case is easy lets, lets observe a property that this greedy clustering algorithm has. Now the algorithm's philosophy is that the squeaky wheel should get the grease. That is, the separated pair of points that are closest to each other are the ones that should get merged. So for this reason, because it's always the closest separated pair that get merged, if you look at the sequence of point pairs that get merged together, that determine the spacing in each subsequent iteration, the distances between these sort of worst separated points is only going up over time. At the beginning of the algorithm, the closest pair of points in the entire point set are the ones that get directly merged. Then those are out of the picture, and now that some further away pair of points are separated, it determines the spacing, then they get merged. Once they've been coalesced, then there is still some further away pair of points, which is now the smallest separated. They get merged, and so on. So if you look at the sequence of distances between the pairs of points that are directly merged by the greedy algorithm, that is only going up over time. And this sequence culminates with the final spacing S of the greedy algorithm. At some sense, the spacing of the output of the greedy algorithm is the distance between the point period that would get merged if we ran the greedy algorithm one more in moderation but unfortunately we're not allowed to do that. Okay? So the point is, for every pair of points directly merged by the greedy algorithm, they're always a distance at most S away from each other. So the easy case, then, is when this pair of points, pq, which, on the one hand, lie in a common greedy structure, but on the other hand, in different clusters with c hats. If they were, at some point, not merely in the same cluster, but actually directly merged by the greedy algorithm. If, at some iteration, they determined the spacing, and were picked by the greedy algorithm to have their, clusters merged. Then we just argued that the distance between p and q is no more than the space in capital s of the greedy clustering. And since p and q lie in different clusters of the c hats. It's separated by the C hats and therefore they upper bound the spacing of the C hats. Maybe there's some even closer separated pair by the C hats. But the very least P and Q are separated so they upper bound the spacing of the C hat clustering. So that's what we wanted to prove. We wanted to show that this alternative spacing didn't have better spacing than our greedy spacing. It had to be at most as big. It had to be at most capital S. So in this easy case, when P and Q are directly merged by the greedy algorithm, we're done. So the tricky case is when P and Q are only indirectly merged, and you may be wondering at the moment, what does that mean? How did two people wind up in the same cluster if they weren't, at some point, directly merged? So let's draw a picture and see how that can happen. So the issue is that two points P and Q might wind up in a common greeting cluster, not because the greedy algorithm ever explicitly considered that point pair, but rather because of a path or cascade of direct mergers of other point pairs. Imagine, for example, that at some iteration of the greedy algorithm the point P was considered explicitly along with the point A1, where here A1 is meant to be different than Q. So that's a direct merger, and P and A1 wind up in the same cluster. Their clusters are merged. Maybe the same thing happened to the point Q at some point A sub L which is different than P. Sooner or later maybe, you know, at some other time, some totally unrelated pair of points A2 and A3 are directly merged and then at some point A1 and A2 are considered by the greedy algorithm. Algorithm, because the other closest pair of separated points, and, they get merged. And so on. So the edges in this picture are meant to indicate direct mergers, pairs of points that are explicitly fused because they determine the spacing of some point of the greedy iteration. But at the end of the day the greedy clustering is going to have the results of all of these mergings. So in case you're feeling confused, let me just point out that we really saw this exact same exact thing going on when we were talking about minimum spanning trees in Kruskal's rhythm. So, at an intermediate point in Kruskal's rhythm, after it's added some edges, but before it's constructed a spanning tree. As we discussed, the intermediate state is a bunch of different connected components. And there are vertices that Have an edge chosen between them. They, of course, are going to be in the same kinetic component. But then again, a kenetic component could have long paths in it. So you could have vertices that are in the same kinetic component in an intermediate state of Kruskal's Algorithm, despite the fact that we've haven't chosen an edge directly between them. There's rather, a path of chosen edges between them. It's exactly the same thing going on here. Now, what we have going for us is that, if a pair of points, as discussed, was directly merged, we know they're close. The distance between them is, at most, this spacing, capital S. We really don't know anything, frankly, about the distance between pairs of vertices that were not directly merged. They just, sort of, accidentally wound up in a common cluster. But this turns out to be good enough. This is actually sufficient to argue that this competitor clustering with the c-hat has spacing no more than s? No better than ours. Let's see why. So given that P and Q are in a common greedy cluster it must mean there was a path of direct mergers that forced them to be in the same cluster. So let's let the intermediate points involved in that path denoted A1 of two AL. So here's the part of the proof where we basically reduce the tricky case to the easy case. So we've got this pair of points, PQ. Now, remember, not, not only are they in a common greedy cluster. But they're in different clusters in our competitor in the c hats. So the point p is in some cluster. Call it c hat i. And Q is in something else. In particular, it's not in c hat i. Now, imagine you go on a hike. You start at the point p, and you hike along this path. You traverse these direct mergers toward q. Now, you're starting inside c hat I, and you end up outside. So at some point on your hike, you will traverse the boundary. You will, for the first time, escape from c hat I, and wind up in some other cluster. So that has to happen. And let's call ai and ai+1 the consecutive pair of points at which you go from inside this cluster to outside this cluster. And now we're back in the easy case. Now we're dealing with a separated pair that would directly merge by the greedy algorithm. Remember that we set up this path to be a path of direct mergers so in particular, AJ and AJ + one were direct mergers, therefore their distance is at most S. And again, by virtue of being direct mergers, their distance is at most the spacing of the greedy clustering and yet as a separated point by the C hats. It's also an upper bound on the spacing of the C hats. This means the spacing S of our greedy clustering is as good as the competitor. Is the competitor was arbitrary or optimal. That completes the proof.