So let me now introduce you to the union 
find data structure. 
Let's not lose sight of our goal. 
Our goal is to be able to check for 
cycles in Kruskal's algorithm in constant 
time. 
So the first, and most basic idea behind 
this union find data structure 
implementation, is, we're going to 
maintain a linked structure for each 
group. 
That is, for each connected component, 
with respect to the edges Kruskal has 
chosen thus far. 
By a link structure, I just mean each 
vertex of the graph is going to have an 
extra pointer field. 
In addition with each group, with each 
connected component, we're going to 
designate one vertex. 
And we don't care which one, just some 
vertex from this connected component, as 
the leader vertex of that component. 
Alright, 
so what is the point of these extra 
pointers at each vertex? 
And what is the point of having these 
leaders? 
Well, a key invariant that we're going to 
maintain is that, a given vertex, with 
its extra pointer, points to the leader 
vertex of its connected component. 
So, for example maybe we have two 
different connected components with three 
vertices each, one containing the 
vertices U, V and W. another containing 
the vertices X, Y and Z. 
any of these three vertices could be the 
leader of each of these two components. 
so perhaps U happens to be the leader 
vertex of the first component and X 
happens to be the leader vertex of the 
second component. 
And then the invariant just says, every 
vertex should be coining to the leader of 
it's component. 
So, V and W should be pointing to U. 
U has it's own ver, it's own pointer, it 
should be pointing to itself. 
Similarly, in the second component, X is 
pointing into itself. 
Y points to X and Z also points to X. 
So, in blue here, are the actual edges of 
the graph. 
And in green, there are these, sort of, 
extra edged pointers that we've invented 
where everybody's pointing back to the 
leader of their component. 
And so, one very simple thing in this 
setup which turns out to be a good idea, 
is each component is in effect, 
inheriting the name of its leader vertex. 
So we refer to a group, we refer to a 
component via the object, via the vertex 
who happens to be the representative. 
Who happens to be the leader. 
And what's kind of amazing is, even just 
this very simple scaffolding on the 
connected components is enough to have 
constant time cycle checks provided the 
invariant is satisfied. 
Well, how do we do that? 
So remember that, [COUGH], checking if 
adding the edge uv is going to create a 
cycle, 
boils down to checking whether there's 
already a path between U and V. 
Well, there's already a path between U 
and V if and only if they're same, in the 
same connected component. 
Given two vertices, U and V. How do we 
know if they're in the same connected 
component? 
We just follow the respective leader 
pointers, and we see if we get to the 
same place. 
If they're in the same component, we get 
the same leader. 
If they're in different components, we 
get different leaders. 
So, checking for a cycle just involves, 
for each of u and v, comparing a quality 
of leader pointers. 
That is clearly constant time. 
More generally, the way you implement the 
Find operation for this flavor of 
disjoint union data structure, is you 
simply return the leader pointer of the 
object that you are handed. 
So if you're given a vertex, you just 
follow the leader pointer of that vertex, 
and you return wherever you wind up. 
So that's pretty excellent, as long as, 
in this simple data structure, the 
invariant is satisfied, we have our 
desired implementation of constant time 
cycle checks. 
But, and you know, certainly, this is a 
recurring theme in our data structure 
discussions. 
Whenever you have a data structure and it 
needs to maintain an invariant. 
Whenever you do an operation that changes 
the data structure. 
So, like in this case, when you do a 
union fusing two groups together. 
You have to worry, well, does the variant 
get destroyed when you do that operation? 
And if it does, how are you going to 
restore the invariant without doing undue 
work? 
So in the present context of Kruskal's 
algorithm, here's how this plays out. 
So we're happily doing our constant time 
cycle checks. 
Whenever an edge creates a cycle, we 
don't do anything, we skip the edge, we 
don't change our data structure, and we 
move on. 
The issue is when we have a new edge and 
it doesn't create a cycle or cycle check 
fails. 
Now, Kruskal's algorithm dictates that we 
add this edge into the set capital T that 
we're building. 
And that fuses two connected components 
together. 
But remember, we have this invariant. 
Every vertex should point to the leader 
of its component. 
Well, if we had component A and we had 
component B, they're both pointing to the 
leader vertex of their respective 
components. 
Now, when these components fuse into one, 
we've got to do some updating of leader 
pointers. 
In particular, there used to be two 
leaders, now there has to be just one, 
and we have to rewire the leader pointers 
to restore the invariant. 
So, to make sure you're clear on this, 
important, problem. 
Let me ask you the following question. 
So, consider the case, when, at some 
point, in Kruskal's algorithm, a new edge 
is added. 
And two connected components fuse into 
one. 
Now, to restore the invariant, you have 
to do some leader pointer updates. 
In the worst case, asymptotically, how 
many leader pointer updates might be 
required in order to restore the 
invariant? 
So, the answer to this question, somewhat 
alarmingly, is the third answer. 
So it might require a linear number, in 
the number of vertices N, 
pointer updates to restore the invariant. 
And one easy way to see that is imagine 
the very last edge that Kruskal's going 
to add in the set T, the one which fuses 
the final two connected components down 
into one. 
For all you know, you know, those two 
components have exactly the same size, 
they have N / 2 vertices each. 
You're going down from two leader 
pointers to one. 
And one of those set of N / 2 vertices 
are going to have to inherit the leader 
pointer from the other side. 
So one of the two sets is going to have 
to have N / 2 vertices, get their leader 
pointer updated. 
So that's a bummer. 
We're sort of hoping for a near linear 
time bound but if every each one of our 
linear number of edge editions might 
trigger a linear number of liter pointer 
updates. 
That seems to be giving rise to a 
quadratic time bound. 
But remember when I introduced the union 
find data structure I only said that 
first idea was idea number one presumably 
there's an idea number two. 
And here it is and it's very natural if 
you are coding up an implementation of 
union find data structure you would 
probably do this optimization yourself. 
Alright. 
So, consider the moment in time in which 
some component A merges with some 
component B. 
Each of these two components currently 
has their own respective leader vertex. 
And all vertices from that group are 
pointing to that leader vertex. 
Now, when they fuse, what are you going 
to do? 
The first obvious thing to do is to say, 
well, let's not bother computing some 
totally new leader. 
Let's either reuse the leader from group 
A, or the leader from group B. 
That way, if, for example, we retain the 
leader from group A. 
The only leader pointers that need to be 
rewired are the ones who come from 
component. 
B. 
The vertices in component A can keep 
their same leader vertex and their same 
leader pointers as before. 
So that's the first point. 
Let's just have a new union of two 
components inherit the leader of one of 
the constituent parts. 
Now if you're going to retain one of the 
two leaders, 
which one are you going to retain. 
Maybe one of the components has 1,000 
vertices. 
The other component only has 100 
vertices. 
Well, given the choice, you're certainly 
going to keep the leader from the bigger, 
from the first component. 
That way, you only have to rewire the 100 
liter pointers of the second group. 
If you catch the leader of the second 
group, you have to rewire the 1000 
pointer from the first group. 
And that seems silly and wasteful. 
So the obvious way to implement a merge 
is you just keep the leader of the bigger 
group, and rewire everybody from the 
second group, the smaller group. 
So you should notice that, in order to 
actually implement this optimization, 
where you always retain the lead of the 
larger group, 
you have to be able to quickly determine 
which of the two groups is larger but you 
can augment the data structure we've 
discussed in, to make, to facilitate 
that. 
So, just, with each group, you just keep 
account of how many vertices are in that 
group. 
So you maintain a size field for each 
group. 
That allows you to check, in constant 
time, what's the population of two 
different groups and figure out, again, 
in constant time, which one's bigger. 
And notice that when you fuse two groups 
together, it's easy to maintain the size 
field. 
It's just the sum of the sizes of the two 
constituent parts. 
So let me now revisit the question from 
the previous slide. 
In the worst case, given this 
optimization, how many leader pointers 
asymptotically might you have to re-wire 
when you merge two components together? 
Well, unfortunately, the song remains the 
same. 
The answer is still the third one. 
And it's because for exactly the same 
reason as on the previous slide. 
It still might be the case that, say, in 
the final iteration of Kruskal, you're 
merging two components that both have 
size N / 2. 
So, it doesn't matter. 
I mean, no matter which leader you 
choose, you're going to be stuck updating 
the leader pointers of N / 2 or theta of 
n vertices. 
So, this is clearly a smart practical 
optimization. 
It doesn't seem to be buying us anything 
in our asymptotic analysis of the running 
time. 
However, however, what if we think about 
the work done in all of these leader 
pointer updates, in the following 
different way? 
Rather than asking how many updates, 
might a merging of two components 
trigger. 
Let's adopt a vertex centric view. 
Let's suppose you're one of the vertices 
of this graph. 
So initially, at the beginning of 
Kruskal's algorithm, you're in your own 
isolated connected component. 
You just point to yourself. 
You're your own leader. 
And then, as, Kruskal's algorithm run 
its, runs its course, your leader pointer 
will periodically get updated. 
At some point, you're no longer pointing 
to yourself. 
You're pointing to some other vertex. 
Then at some point, your pointer gets 
updated again. 
You're pointing to yet some other vertex, 
and so on. 
How many times over the entire trajectory 
of Kruskal's algorithm. 
In light of our new optimization, might 
you, as some vertex of this graph, have, 
your leader pointer updated. 
Well, here's the very cool answer, the 
answer is the second one. 
So while it remains true, that if you 
always have the union of two groups, and 
here at the leader pointer of the larger 
one, it's still true that a given fusion 
might trigger, a, a linear number of 
leader pointer updates, each, vertex, 
will only see, its leader pointer updated 
a logarithmic number of times, over the 
course of Kruskal's algorithm. 
What is the reason for this? 
Well suppose you're a vertex and you're 
in some group and it has maybe twenty 
vertices, so you're one of twenty. 
Now suppose at some point, your leader 
pointer gets updated. 
Why did that happen? 
Well it meant that your group of twenty 
vertices merged with some other group 
that has to be bigger. 
Remember your leader pointer only gets 
rewired in a fusion if you were in a 
smaller group. 
So you're joining a group at least as big 
as yours, so the size of the union, the 
size of your new community, your new 
connected component, is at least double. 
The size of your previous one. 
So the bottom line is, every time you, as 
a vertex, has your leader pointer 
updated. 
The population, and the component to 
which you belong is at least twice as 
large as before. 
Now, you start in a connected component 
of size one. 
The connected component is not going to 
have more than N vertices. 
So the number of doublings you might have 
to endure, is, at most, log base two of 
n. 
So that bounds how many leader pointers 
you will see as a vertex in this graph. 
So, in light of that very cool operation, 
we can now give a good running time 
analysis of Kruskal's algorithm using the 
union find data structure. 
Using scaffolding on the component 
structure to do cycle checks. 
We, of course, have not changed the 
processing step. 
We're sorting the edges from cheapest to 
most expensive at the beginning of the 
algorithm and that still takes big O of M 
log N time. 
So what work do we have to do, beyond 
this sorting pre-processing step? 
Well, fundamentally, Kruskal's algorithm 
is just about these cycle checks. 
And each iteration of the four loop, we 
have to check if adding a given edge 
would create a cycle with those edges 
we've already added. 
And the whole point of the union find 
scaffolding, these link structures, is 
that we can check cycles in constant 
time. 
Just given an edge, we look at the leader 
pointers of a ten points. 
And, a cycle will be created if and only 
if their leader pointers are identical. 
So, for the cycle checking, we only do 
constant time for each of the O of M 
iterations. 
But the final source of work is 
maintaining this union fine data 
structure. 
Restoring the invariant each time we add 
a new edge to the set capital T. 
And here, the good idea is, we're not 
going to just bound to the worst case 
running time of these leader pointers for 
each iteration. 
because that could be too expensive. 
That could be up to linear in just a 
single iteration. 
Rather, we're going to do a global 
analyses, thinking about the total number 
of liter pointers that ever occur. 
On the previous find, we observed that 
for a single vertex, it's only going to 
endur a logarithmic number of liter 
pointer updates. 
So summing over all the end vertasese, 
the total work done for leader pointer 
updates is only end login. 
So even though we might do a linear 
amount of pointer updates on just one 
iteration of this for loop, we still have 
this global upper bound of N log N on the 
total number of leader pointer updates, 
very cool. 
So looking over this tally, we observe 
the stunning fact that the bo, the 
bottleneck for this implementation of 
Kruscos algorithm is actually sorting. 
So we do more work in the pre-processing 
step, N log N, than we do in the entire 
fore-loop. 
Which is only M plus N log N. 
So that gives us an overall running time 
bound of Big O of N log N, matching the 
theoretical performance that we've 
achieved. 
Prim's algorithm implemented with heaps. 
So just like with our heat based 
implementation, with Prim's algorithm, we 
get a running time which is almost linear 
only logarithmic factors slower than 
reading the input. 
And just like with Prim's algorithm, you 
should. 
Find Kruskal's algorithm to be very 
competitive in practice.