Okay. So in this video we're going to 
begin our discussion about why Prim's 
algorithm is correct. 
Why always, for every connected graph 
outputs a minimum spanning tree of that 
graph. 
For this video, we're going to content 
ourselves with a much more modest school. 
We're only going to prove for now the 
Prim's algorithm outputs a spanning tree. 
We're not going to make any claims yet 
about optimality. 
Even just this fact is not trivial and 
proving it will give us a good 
opportunity to get our hands dirty with 
some basic properties of graphs and 
specifically graph cuts. 
Graduates of part 1 of this online class 
of course are already familiar with graph 
cuts. We studied them at length via 
Karger's randomized algorithm for 
computing the minimum cut of a graph. 
So, the concept is the same here, let me 
state it again to jog your memory. 
So a cut of a graph is simply a partition 
of its vertex set, two groups, and each 
of those two groups should be non-empty. 
So pictorially, we envision some of the 
vertices of G, this blob A being in one 
group, 
and the rest of the vertices, this graph 
B being in a different group. 
Now, what's up with the edges? 
How can they be distributed in this 
picture? 
Well, the two endpoints of an edge, 
there's three cases, 
either both of the endpoints can be in 
the set A. 
So there's various edges internal to A. 
Similarly, an edge might have both of its 
endpoints inside of B. 
But we're going to be most interested in 
the third case, 
edges that have one point exactly in each 
of A and B. 
So these are edges that we say cross the 
cut, A, B. 
So hopefully the definition of a cut 
seems simple enough, but cuts in 
particular their relationship to edges 
can be quite interesting, quite useful. 
So as shown here in the picture, of 
course for a given cut, there can be many 
edges crossing it. 
by the same token for a given edge of a 
graph, in general, there will be many 
cuts of the grap, 
that's, that edge crosses. 
So, to understand this a little bit 
better, let's just review a simple 
property that cuts through the graph. 
Let me just ask you just how many there 
are. 
Specifically, for a graph that has n 
vertices, roughly how many cuts does it 
have? 
Roughly n, roughly n squared, roughly 
2^n, or roughly n^n? 
Now, none of these four answers is 
exactly right, but one of the four is a 
lot closer to the exact expression than 
the other three and I'm asking you, which 
of them is it? 
Alright. 
So the correct answer is the third one, 
2^n. 
A graph of n vertices has essentially 2^n 
cut, so there's an exponential number of 
cuts there's a lot of them. 
So why is this true? 
Well, in effect you can imagine making a 
binary decision for each of the n 
vertices. 
They either go into A. 
What were they going to be? 
So n binary decisions results in 2^n 
different outcomes. 
Now why is this slightly incorrect? 
Well, in fact, a cut has to have two 
non-empty sets. 
A is not allowed to be empty, 
B is not allowed to be empty, 
so that rules out two of the 
possibilities. 
So actually, strictly speaking, it's 2^n 
- 1 different cuts of a graph. 
So what we're going to do next is we're 
going to state and prove three easy facts 
about cuts in graphs. 
Once we have these three easy facts, we 
will be able to prove the claim at the 
beginning of this video, 
namely the Prim's Algorithm always 
outputs a spanning tree. 
The first of these three properties about 
cuts, I'm going to call the empty cuts 
lemma. 
The point of the empty cut lemma is to 
give us a characterization that is a new 
way of saying when a graph is connected. 
So in particular, I'm going to phrase in 
terms of a graph not connected. 
And the claim is that a graph is not 
connected if and only if we can find a 
cut of the graph that has no edges 
crossing it. 
So remember how we defined a graph being 
connected, 
that means for any two vertices in the 
graph we can find a path in the graph 
from one vertex to the other. 
So what we're saying is that being not 
connected, 
that is, there existing a pair of 
vertices with no path between them is 
equivalent to there being a cut with no 
crossing edges. 
So let's go ahead and prove this real 
quick. 
So as an if and only if statement, 
really this proof, we have to do in two 
parts. 
First, we have to prove that assuming the 
first statement, we can derive the 
second. 
Then we have to show that assuming the 
second statement, we can derive the 
first. 
I think the easier direction is to assume 
the right-hand side and then derive the 
left-hand side. 
So let's start with that one. 
That is, consider a graph G so that 
there's a cut, A, B with no edges of G 
crossing this cut. 
The plan is to exhibit a pair of vertices 
that do not have a path between them, 
there, thereby certifying that the graph 
is not connected. 
So, it's pretty easy to figure out which 
pair of vertices we should look at, 
just take one vertex from each side of 
the cut which has no crossing edges. 
So why is it that there's no path from U 
to V in the graph G? 
Well the path from U to V would surely 
have to cross the cuts, A, B, but there's 
no edges available for crossing the cut. 
So therefore, this path from U to V 
cannot exist. 
So that completes the first part of the 
proof. We assume the right-hand side, we 
derive the left-hand side, 
now we start all over again, but we 
assume the left-hand side and we have to 
prove the right-hand side. 
So by virtue of, by the assumption that 
the graph is not connected, there has to 
exist a pair of verticies U and V that 
have no path between them. 
We are now responsible for exhibiting 
some cut A, B such that no edges of the 
graph G crossing. 
So where are we going to get these sets 
capital A and capital B from? 
Well, here is the trick, which is going 
to make the proof go really nicely. 
We define the set of verticies of capital 
A to be those reachable from U in the 
graph G. 
Another way to think about this is that 
capital A is simply used connected 
components in the sense that we discussed 
in part 1 of the course. 
Now because we want to cut and a cut is 
our partition, we better well put in the 
group, capital B, all of the verticies 
that are not in A. 
If you like, this is all of the connected 
components other than the one that 
contains U. 
Note that by definition, U is in capital 
A, 
certainly U is reachable from itself. 
And by assumption, V and U are not 
reachable from each other, 
so V is going to be in capital B. 
So neither of these sets is non-empty. 
This is indeed a bonafide cut of the 
graph G. 
All that remains is to notice that there 
are no crossing edges across this cut. 
And why is that true? 
Well, if there was an edge crossing the 
cut A, B with one endpoint in A, one 
endpoint in B. Well, by definition, there 
are paths from U to everything else in A, 
so if there is any edge sticking out of 
A, that would give us a path to some 
vertex in B. 
But, B definition of vertices not 
reachable from capital A, so that's a 
contradiction. 
So again, the point is that if there were 
edges crossing this cut, then we can 
expand A and make it even bigger. So 
therefore, there aren't any edges 
crossing the cut. 
The cut is empty, that's what we needed 
to prove. 
Assuming the graph was disconnected, we 
have exhibited a cut, A, B with no 
crossing edges. 
So that wraps up of the first of our 
three facts, and in fact, the most 
difficult of our three facts about cuts 
in graphs. 
And again,, what did the empty cut lemma 
say? 
It gives us a new way of talking about 
whether or not a graph is connected. 
So it's disconnected if and only if 
there's an empty cut. 
It's connected if and only if there are 
no empty cuts. 
So that's the keypoint from this slide. 
Let's now knock off the other two facts 
we're going to need. 
The first one I'm going to call the 
double crossing lemma. 
In essence, what the double crossing 
lemma says, is that, if a cycle in a 
graph crosses a cut, then it has to cross 
it twice, 
it cannot cross it only once. 
So pictorially, we look at a cut of a 
graph, so there's the two vertex groups A 
and B. 
By hypothesis, there's some edge E with 
one endpoint in each side, 
and by assumption, this E, this edge E, 
participates in some cycle that we're 
calling capital C. 
And if you look at the picture, you 
realize that the claim in this lemma is 
obvious, 
that, because the cycle has to loop back 
on itself, if it has an edge with one 
endpoint on either side, there has to be 
a path connecting the two dots, 
connecting those two endpoints back to 
each other and that path has to cross 
back for, over this cut A, 
B. 
Indeed, the double crossing lemma is a 
special case of a stronger statement 
which is equally easier to see, which is 
that if you take any cut of a graph and 
you take any cycle you know, it starts 
and ends at the same point, 
then it has to cross this cut an even 
number of times. 
It might cross it 0 times, but it's not 
going to cross it once. 
It could cross it twice. 
It could cross it four times, if it 
crisscrosses back and forth. 
It could cross it six times, and so on. 
But if it crosses it strictly more than 0 
times, then it has to cross it at least 
twice. 
That's the point of the double crossing 
lemma. 
So, we'll use this in its own rights 
later on. 
But I'm also, for the moment, interested 
in easy corollary of the double crossing 
lemma. 
I will call this the lonely cut 
corollary. 
Let me tell you the point of the lonely 
cut corollary. 
In general, in these spanning tree 
algorithms, 
to ensure that we output a spanning tree, 
then we have to, in particular, make sure 
we don't create any cycles. 
The point of this corollary is it's a 
tool to argue that we don't create 
cycles. 
So how can we be sure that an edge 
doesn't create cycles? Well, here is a 
way. 
Suppose there's a cut, so we're looking 
at an edge E, suppose we can identify a 
cut A, B so that edge E is the only cut 
crossing it, it's the lonely edge 
crossing this cut. 
Well then, by the double crossing lemma, 
there is no way this thing is in any 
cycle. 
If it were in a cycle and a cross to cut, 
that cycle would have to cross it again 
and it's edge wouldn't be lonely, it 
would have company. 
So if you're lonely on a cut, it mean's 
you cannot be in a cycle. 
So now we've got all of our ducks lined 
up in a row and we're ready to prove the 
first part of the correctness of Prim. 
That is, we're ready to argue that Prim's 
algorithm, given a connected graph, 
outputs a spanning tree. 
Again, for the moment, we're making no 
claims about optimality, that will be in 
the next video. 
So we're going to make this argument in 
three steps. 
And for the first step, you might want to 
go look again at the pseudocode of Prim's 
algorithm just to remember what the 
notation was. 
The first step, we're just going to 
notice that the semantics of the 
algorithm are respected. 
So the algorithm maintains two different 
sets throughout its evolution. 
On the one hand it maintains a set 
capital x, intended to be the vertices 
spanned so far. 
The other hand, it maintains a set of 
edges, capital T, the edges that have 
been picked so far. 
And the intent was that the current edges 
capital T always spans the current vertex 
at capital x. 
So the first thing is just to verify that 
that is in fact true. 
This I'm not going to prove formally. 
In my experience, students find this kind 
of obvious and the intuition is correct. 
if you want a rigorous proof, go go ahead 
and fill in the details yourself. 
It's a straightforward induction with no 
nasty surprises. 
[SOUND] Now, we're trying to argue the 
output of this algorithm is a spanning 
tree. 
So let's recall what that means. 
What is it that we have to check? 
So there's two properties. 
First of all, there can't be any cycles, 
there can't be any loops. 
Second of all, it has to span all of the 
vertices. 
It has to be a path inside the tree edges 
from any vertex to any other vertex. 
So let's go ahead and prove both those 
things in reverse order. 
So, the second step of the proof is going 
to be to argue that the algorithm outputs 
something which does span all of the 
vertices. 
So at the end of the day, we'll have a 
path from any vertex to any other vertex 
using only the edges in our chosen set, 
capital T. Now, by part one of this 
proof, all we need to prove is that the 
algorithm halts with capital X equal to 
capital V, then we know that capital T 
spans everything in V. 
So how could that not happen? 
How could Prim's algorithm somehow halts 
with this spanned vertices capital X, not 
being all of capital B,? We'll go back 
and check out the pseudocode and look at 
the main wild loop. 
So every wild loop, every iteration, we 
add one new vertex to capital X. What 
could go wrong? The only thing that could 
go wrong would be is if some iteration, 
before we're spanning everything, when we 
scan the frontier around capital X, there 
aren't any edges. 
That's the only way we can fail to 
increase the vertices in capital X in a 
given duration. 
But what would that mean? 
What would it mean if in some iteration 
we couldn't find edges with one endpoint 
in capital X and the other endpoint in V 
- X? 
Well then we would have exhibited an 
empty cut. 
The cut X, V - X would have no crossing 
edges. 
And now we can use the empty cut lemma, 
which says if there's an empty cut, then 
the graph is disconnected. 
But by assumption, we're working with a 
connected input graph, so that can't 
happen. 
Okay? So the algorithm never gets stuck, 
we always increase capital X by one 
vertex because the original graph was 
connected, that means that halt was 
something spanning all of the verticies. 
For the final step, we need to argue that 
Prim's algorithm never creates any cycles 
in the edges that it, it's choosing 
capital T. 
So, why are there no cycles? 
Well, what we're going to do is we're 
going to talk about each edge in turn, 
the Prim's algorithm adds, 
and argue that whenever a new edge gets 
added, there's no way that edge creates 
any cycles in the set capital T. 
And, to see why, take a snapshot of the 
algorithm of some given iteration, 
to the sum current set capital T, and 
there's some set verticies capital X that 
the edges in T span. 
V - X to the verticies not yet spanned by 
T and of course we can think of X, V - X 
as a cut of the graph. 
And at this moment in time, at this 
snapshot, the edges of capital T, they're 
all of one type. 
They all have both of their endpoints 
inside capital X, none of them have any 
endpoints inside V - X. 
So in particular, none of the edges 
chosen thus far cross the cut X, V - X. 
That's by construction, they only span 
the verticies of X. 
Now what type of edge is going to get 
added in this iteration. 
Well, Prim's algorithm searches only over 
edges that have one endpoint inside X and 
one endpoint outside. 
That is, it searches only over edges that 
cross the cut X, V - X. 
So the edge that gets added in this 
iteration is going to be a trailblazer 
for this cut. 
None of the edges yet shows and cross the 
cut, 
but the edge showed in this iteration 
will definitely, cross the cut. 
So the moment edge E gets added to the 
tree capital T, it is going to be lonely 
across the cut V sorry, X, V - X. 
So by the lonely cut corollary as the 
sole member crossing this cut in capital 
T, it cannot possibly participate in any 
cycles. 
Remember, if it participated in a cycle 
in capital T, that cycle would have to 
cross this cut somewhere else. 
But there aren't any other edges crossing 
this cut, this is the only one. 
So that's why when we add a new edge, 
there's no way it can create any cycles. 
It's the sole member crossing this 
particular cut.