Okay. So in this video we're going to begin our discussion about why Prim's algorithm is correct. Why always, for every connected graph outputs a minimum spanning tree of that graph. For this video, we're going to content ourselves with a much more modest school. We're only going to prove for now the Prim's algorithm outputs a spanning tree. We're not going to make any claims yet about optimality. Even just this fact is not trivial and proving it will give us a good opportunity to get our hands dirty with some basic properties of graphs and specifically graph cuts. Graduates of part 1 of this online class of course are already familiar with graph cuts. We studied them at length via Karger's randomized algorithm for computing the minimum cut of a graph. So, the concept is the same here, let me state it again to jog your memory. So a cut of a graph is simply a partition of its vertex set, two groups, and each of those two groups should be non-empty. So pictorially, we envision some of the vertices of G, this blob A being in one group, and the rest of the vertices, this graph B being in a different group. Now, what's up with the edges? How can they be distributed in this picture? Well, the two endpoints of an edge, there's three cases, either both of the endpoints can be in the set A. So there's various edges internal to A. Similarly, an edge might have both of its endpoints inside of B. But we're going to be most interested in the third case, edges that have one point exactly in each of A and B. So these are edges that we say cross the cut, A, B. So hopefully the definition of a cut seems simple enough, but cuts in particular their relationship to edges can be quite interesting, quite useful. So as shown here in the picture, of course for a given cut, there can be many edges crossing it. by the same token for a given edge of a graph, in general, there will be many cuts of the grap, that's, that edge crosses. So, to understand this a little bit better, let's just review a simple property that cuts through the graph. Let me just ask you just how many there are. Specifically, for a graph that has n vertices, roughly how many cuts does it have? Roughly n, roughly n squared, roughly 2^n, or roughly n^n? Now, none of these four answers is exactly right, but one of the four is a lot closer to the exact expression than the other three and I'm asking you, which of them is it? Alright. So the correct answer is the third one, 2^n. A graph of n vertices has essentially 2^n cut, so there's an exponential number of cuts there's a lot of them. So why is this true? Well, in effect you can imagine making a binary decision for each of the n vertices. They either go into A. What were they going to be? So n binary decisions results in 2^n different outcomes. Now why is this slightly incorrect? Well, in fact, a cut has to have two non-empty sets. A is not allowed to be empty, B is not allowed to be empty, so that rules out two of the possibilities. So actually, strictly speaking, it's 2^n - 1 different cuts of a graph. So what we're going to do next is we're going to state and prove three easy facts about cuts in graphs. Once we have these three easy facts, we will be able to prove the claim at the beginning of this video, namely the Prim's Algorithm always outputs a spanning tree. The first of these three properties about cuts, I'm going to call the empty cuts lemma. The point of the empty cut lemma is to give us a characterization that is a new way of saying when a graph is connected. So in particular, I'm going to phrase in terms of a graph not connected. And the claim is that a graph is not connected if and only if we can find a cut of the graph that has no edges crossing it. So remember how we defined a graph being connected, that means for any two vertices in the graph we can find a path in the graph from one vertex to the other. So what we're saying is that being not connected, that is, there existing a pair of vertices with no path between them is equivalent to there being a cut with no crossing edges. So let's go ahead and prove this real quick. So as an if and only if statement, really this proof, we have to do in two parts. First, we have to prove that assuming the first statement, we can derive the second. Then we have to show that assuming the second statement, we can derive the first. I think the easier direction is to assume the right-hand side and then derive the left-hand side. So let's start with that one. That is, consider a graph G so that there's a cut, A, B with no edges of G crossing this cut. The plan is to exhibit a pair of vertices that do not have a path between them, there, thereby certifying that the graph is not connected. So, it's pretty easy to figure out which pair of vertices we should look at, just take one vertex from each side of the cut which has no crossing edges. So why is it that there's no path from U to V in the graph G? Well the path from U to V would surely have to cross the cuts, A, B, but there's no edges available for crossing the cut. So therefore, this path from U to V cannot exist. So that completes the first part of the proof. We assume the right-hand side, we derive the left-hand side, now we start all over again, but we assume the left-hand side and we have to prove the right-hand side. So by virtue of, by the assumption that the graph is not connected, there has to exist a pair of verticies U and V that have no path between them. We are now responsible for exhibiting some cut A, B such that no edges of the graph G crossing. So where are we going to get these sets capital A and capital B from? Well, here is the trick, which is going to make the proof go really nicely. We define the set of verticies of capital A to be those reachable from U in the graph G. Another way to think about this is that capital A is simply used connected components in the sense that we discussed in part 1 of the course. Now because we want to cut and a cut is our partition, we better well put in the group, capital B, all of the verticies that are not in A. If you like, this is all of the connected components other than the one that contains U. Note that by definition, U is in capital A, certainly U is reachable from itself. And by assumption, V and U are not reachable from each other, so V is going to be in capital B. So neither of these sets is non-empty. This is indeed a bonafide cut of the graph G. All that remains is to notice that there are no crossing edges across this cut. And why is that true? Well, if there was an edge crossing the cut A, B with one endpoint in A, one endpoint in B. Well, by definition, there are paths from U to everything else in A, so if there is any edge sticking out of A, that would give us a path to some vertex in B. But, B definition of vertices not reachable from capital A, so that's a contradiction. So again, the point is that if there were edges crossing this cut, then we can expand A and make it even bigger. So therefore, there aren't any edges crossing the cut. The cut is empty, that's what we needed to prove. Assuming the graph was disconnected, we have exhibited a cut, A, B with no crossing edges. So that wraps up of the first of our three facts, and in fact, the most difficult of our three facts about cuts in graphs. And again,, what did the empty cut lemma say? It gives us a new way of talking about whether or not a graph is connected. So it's disconnected if and only if there's an empty cut. It's connected if and only if there are no empty cuts. So that's the keypoint from this slide. Let's now knock off the other two facts we're going to need. The first one I'm going to call the double crossing lemma. In essence, what the double crossing lemma says, is that, if a cycle in a graph crosses a cut, then it has to cross it twice, it cannot cross it only once. So pictorially, we look at a cut of a graph, so there's the two vertex groups A and B. By hypothesis, there's some edge E with one endpoint in each side, and by assumption, this E, this edge E, participates in some cycle that we're calling capital C. And if you look at the picture, you realize that the claim in this lemma is obvious, that, because the cycle has to loop back on itself, if it has an edge with one endpoint on either side, there has to be a path connecting the two dots, connecting those two endpoints back to each other and that path has to cross back for, over this cut A, B. Indeed, the double crossing lemma is a special case of a stronger statement which is equally easier to see, which is that if you take any cut of a graph and you take any cycle you know, it starts and ends at the same point, then it has to cross this cut an even number of times. It might cross it 0 times, but it's not going to cross it once. It could cross it twice. It could cross it four times, if it crisscrosses back and forth. It could cross it six times, and so on. But if it crosses it strictly more than 0 times, then it has to cross it at least twice. That's the point of the double crossing lemma. So, we'll use this in its own rights later on. But I'm also, for the moment, interested in easy corollary of the double crossing lemma. I will call this the lonely cut corollary. Let me tell you the point of the lonely cut corollary. In general, in these spanning tree algorithms, to ensure that we output a spanning tree, then we have to, in particular, make sure we don't create any cycles. The point of this corollary is it's a tool to argue that we don't create cycles. So how can we be sure that an edge doesn't create cycles? Well, here is a way. Suppose there's a cut, so we're looking at an edge E, suppose we can identify a cut A, B so that edge E is the only cut crossing it, it's the lonely edge crossing this cut. Well then, by the double crossing lemma, there is no way this thing is in any cycle. If it were in a cycle and a cross to cut, that cycle would have to cross it again and it's edge wouldn't be lonely, it would have company. So if you're lonely on a cut, it mean's you cannot be in a cycle. So now we've got all of our ducks lined up in a row and we're ready to prove the first part of the correctness of Prim. That is, we're ready to argue that Prim's algorithm, given a connected graph, outputs a spanning tree. Again, for the moment, we're making no claims about optimality, that will be in the next video. So we're going to make this argument in three steps. And for the first step, you might want to go look again at the pseudocode of Prim's algorithm just to remember what the notation was. The first step, we're just going to notice that the semantics of the algorithm are respected. So the algorithm maintains two different sets throughout its evolution. On the one hand it maintains a set capital x, intended to be the vertices spanned so far. The other hand, it maintains a set of edges, capital T, the edges that have been picked so far. And the intent was that the current edges capital T always spans the current vertex at capital x. So the first thing is just to verify that that is in fact true. This I'm not going to prove formally. In my experience, students find this kind of obvious and the intuition is correct. if you want a rigorous proof, go go ahead and fill in the details yourself. It's a straightforward induction with no nasty surprises. [SOUND] Now, we're trying to argue the output of this algorithm is a spanning tree. So let's recall what that means. What is it that we have to check? So there's two properties. First of all, there can't be any cycles, there can't be any loops. Second of all, it has to span all of the vertices. It has to be a path inside the tree edges from any vertex to any other vertex. So let's go ahead and prove both those things in reverse order. So, the second step of the proof is going to be to argue that the algorithm outputs something which does span all of the vertices. So at the end of the day, we'll have a path from any vertex to any other vertex using only the edges in our chosen set, capital T. Now, by part one of this proof, all we need to prove is that the algorithm halts with capital X equal to capital V, then we know that capital T spans everything in V. So how could that not happen? How could Prim's algorithm somehow halts with this spanned vertices capital X, not being all of capital B,? We'll go back and check out the pseudocode and look at the main wild loop. So every wild loop, every iteration, we add one new vertex to capital X. What could go wrong? The only thing that could go wrong would be is if some iteration, before we're spanning everything, when we scan the frontier around capital X, there aren't any edges. That's the only way we can fail to increase the vertices in capital X in a given duration. But what would that mean? What would it mean if in some iteration we couldn't find edges with one endpoint in capital X and the other endpoint in V - X? Well then we would have exhibited an empty cut. The cut X, V - X would have no crossing edges. And now we can use the empty cut lemma, which says if there's an empty cut, then the graph is disconnected. But by assumption, we're working with a connected input graph, so that can't happen. Okay? So the algorithm never gets stuck, we always increase capital X by one vertex because the original graph was connected, that means that halt was something spanning all of the verticies. For the final step, we need to argue that Prim's algorithm never creates any cycles in the edges that it, it's choosing capital T. So, why are there no cycles? Well, what we're going to do is we're going to talk about each edge in turn, the Prim's algorithm adds, and argue that whenever a new edge gets added, there's no way that edge creates any cycles in the set capital T. And, to see why, take a snapshot of the algorithm of some given iteration, to the sum current set capital T, and there's some set verticies capital X that the edges in T span. V - X to the verticies not yet spanned by T and of course we can think of X, V - X as a cut of the graph. And at this moment in time, at this snapshot, the edges of capital T, they're all of one type. They all have both of their endpoints inside capital X, none of them have any endpoints inside V - X. So in particular, none of the edges chosen thus far cross the cut X, V - X. That's by construction, they only span the verticies of X. Now what type of edge is going to get added in this iteration. Well, Prim's algorithm searches only over edges that have one endpoint inside X and one endpoint outside. That is, it searches only over edges that cross the cut X, V - X. So the edge that gets added in this iteration is going to be a trailblazer for this cut. None of the edges yet shows and cross the cut, but the edge showed in this iteration will definitely, cross the cut. So the moment edge E gets added to the tree capital T, it is going to be lonely across the cut V sorry, X, V - X. So by the lonely cut corollary as the sole member crossing this cut in capital T, it cannot possibly participate in any cycles. Remember, if it participated in a cycle in capital T, that cycle would have to cross this cut somewhere else. But there aren't any other edges crossing this cut, this is the only one. So that's why when we add a new edge, there's no way it can create any cycles. It's the sole member crossing this particular cut.