So now we understand why Kruskal's 
algorithm is correct, why it always 
computes a Minimum cost-spanning Tree. 
In this video, we'll turn our attention 
to implementation issues. 
We'll begin with a straightforward 
implementation of Kruskal's algorithm. 
That'll give us a polynomial run time 
bound which is good but we'd like to do 
better. 
So then we'll show how deploying a 
suitable data structure, something you 
haven't seen before, the Union-Find data 
structure, allows us to speed up 
Kruskal's Algorithm to be competitive 
with Prim's Algorithm. 
That is we'll get a near linear running 
time bound of MlogN. 
So let's just briefly review the very 
elegant pseudocode for Kruskal's 
Algorithm, 
so it's a greedy algorithm. 
It considers the cheapest edges first, 
all the way up to the most expensive. 
So we begin with a sorting pre-processing 
step to put the edges in sorted order for 
notational convenience let's just rename 
the edges. So, that one is the cheapest 
edge, and, all the way up to M being the 
most expensive edge. 
We then have our single linear scan in 
this for loop, and we just grab edges 
whenever we can, 
okay? 
So we maintain this evolving set capital 
T, which at the end of the algorithm will 
be our spanning tree. 
Now, what forces us to exclude an edge 
from this set capital T? 
Well if it creates a cycle with the edges 
was already chosen, obviously that's a no 
go. 
We can't have cycles in our final output, 
but as long as we don't have a cycle from 
including an edge, we go ahead and 
optimistically include it. 
And as we've seen, this is a correct 
algorithm, it always outputs the minimum 
cost spanning tree. 
So, what would be the running time of 
this algorithm? 
If we just straightforwardly implement 
the pseudocode on this slide? 
Well, let's just considered the algorithm 
step by step. 
In the first step, we sort the edges, 
so that's going to take M log N time. 
Now don't forget whenever we're speaking 
about graphs, we have the convention that 
M denotes the number of edges and N 
denotes the number of vertices. 
So, you might justifiably wonder why I 
wrote M log N for the running time of the 
sorting step instead of M log M, since 
after all what it is we're sorting are 
the edges and there's M of them. 
Well, what I'm using here is that, in 
this context we can switch log N and log 
M interchangeably with each other inside 
a big-O notation. 
Why is that true? 
Well recall when we first discussed 
graphs in part one, we noticed that there 
can't be too many edges. 
So the number of edges M, is the most 
quadratic of the number of vertices, 
it's the most big-O of N squared. 
So if M is at most M squared, then log M 
is at most two log N And the two is 
suppressed in the big-O notation. 
So log M and log M are interchangeable in 
this context. 
Notice that for the minimum cost spanning 
tree problem you may as well assume that 
there's no parallel edges. 
You may as well assume that the graphs 
are simple. 
If you have a bunch of parallel edges 
between a given vertices, 
you can just throw out all but the 
cheapest one. 
That's the only one you'll ever need. 
So, moving on to the main loop, pretty 
obvious how many iterations we have 
there. 
We have M iterations. 
So all we need to figure out is how much 
work we have to do in each iteration. 
So what is it each iteration needs to 
accomplish? 
It needs to check, whether or not adding 
the current edge to the edges we've 
already chosen creates a cycle or not. 
So I claim that can be done in time 
linear in the number of vertices. 
That is it can be done in the big-O of N 
time. 
So how do we accomplish this? 
Well, we need two quick observations. 
First of all, and this is something we've 
seen in arguments in the previous videos, 
checking whether or not this new edge is 
going to create a cycle. 
Say the edge has end points U and V. 
Checking for a cycle boils down to 
checking whether or not there's already a 
path between the end points U and V, and 
the edges capital T that was chosen so 
far. 
If there is a UV path already, adding 
this edge will close the loop and create 
a cycle. 
If there currently is no UV path, then 
adding this edge will not create a cycle. 
So the second observation is well, how do 
we check if there's a path from U to V, 
in the edges we've already chosen? 
Well, we already know how to do that 
just. 
Using graph search. 
So you can use breadth for a search, you 
can use depth for a search. 
It doesn't matter. 
You just start at the vertex U and you 
see if you have a reach of V or not. 
If you reach it, there's a path. 
If you don't reach it, 
there's not a path. 
Breadth-first-search, depth-first-search, 
whatever. 
It takes time linear, in the graph that 
you're searching. 
And since we only need to search for the 
edges that are in capital T. 
And there's going to be, at most, N minus 
one of them. 
Linear time in this context means O of N. 
O of the number of vertices, 
because that also bounds the number of 
edges that might be in capital T. 
The edges that we're searching for are 
pathing. 
So, adding up all of this work, what do 
we have? 
We have this sorting pre-processing step. 
That takes time big-O of M log N, 
then we have these M iterations of the 
four loop like this is takes O of N times 
factor. 
the latter term dominates, so the overall 
running time is big-O of M times N. 
So this, coincidentally, is the same 
running time we got from the 
straightforward implementation of Prim's 
algorithm, 
and I'll make the same comments here. 
This is a reasonable running time, it's 
polynomial on the input size. 
It's way better than checking all 
exponentially many spanning trees that 
the graph might have. 
But we certainly would like to do better. 
We'd certainly love to have 
implementation of Kruskal's algorithm 
that gets us to a near linear running 
time bound, and that's the plan. 
How are we going to do it? 
Well, really the work that we're doing 
here over and over again, which is kind 
of a bummer, is these cycle checks. 
And every single iteration, we're 
spending time linear in the number of 
vertices to check for a cycle. 
And the question is, 
can we speed that up? 
And the Union-Find data structure will 
actually, believe it or not, allow us to 
check for a cycle in constant time. 
So if we actually had a data structure 
that could implement constant time cycle 
checks. 
Then we'd have to spend only constant 
time for each iteration of this while 
loop. 
So the loop overall would take only 
linear time in the number of edges, O of 
M edges. 
If we got that, then believe it or not, 
the sorting pre-processing step would 
become the bottle neck in the running 
time of Kruskal's algorithm. 
Our running time would drop from N times 
N down to near linear, 
down O of N log N. 
So let me now tell you a little bit about 
this magical data structure that's going 
to give us constant time cycle checks. 
I'm just going to give you the high level 
picture, and how it connects to Kruskal's 
algorithm on this slide. 
We'll look at the details of the data 
structure, in the next video. 
I also want to warn you that I'm not 
going to discuss, in this pair of videos, 
the state of the art for Union-Find Data 
Structures. 
I'm going to give you a fairly primitive 
version, 
but that is nevertheless, sufficient to 
give us our desired M Log N running time 
of Kruskal's algorithm. 
So if you're interested, there is some 
optional material about different 
implementations of Union-Find that use 
some super cool ideas, like union by rank 
and path compression. 
That give you different, and in some 
senses, better operation times. 
But the quick and dirty version of 
Union-Find that I'm going to discuss 
here, is sufficient for our present 
needs. 
And so the Raison d'être of a Union-Find 
Data Structure is to maintain a partition 
of a set of objects. 
So in this picture, the overall rectangle 
is meant to denote a set of objects and 
then C1, C2, C3, and C4 are disjointed 
subsets that together form the union of 
the entire set. 
So that's what I mean by a partition of a 
group of objects. 
We're not going to ask too much of this 
data structure. 
We're only going to ask it to support two 
operations. 
No prizes to guess what those two 
operations are called. 
So the find operation, we give it an 
object from this universe and we ask the 
data structure to return to us the name 
of the group to which that object 
belongs. 
So for example, if we handed it something 
in the middle of this rectangle, an 
object, we'd expect it to return to us 
the name c3. 
The union operation by contrast, takes as 
input, the names of two groups. 
And what we want the data structure to do 
is to fuse those two groups together. 
That is, the objects in the first group, 
and the objects in the second group 
should all coalesce, and be, now, in one 
sole group. 
So why might such a data structure be 
useful for speeding up Kruskal's 
Algorithm? 
To see the connection, think of Kruskal's 
algorithm as working conceptually in the 
following way. 
So initially, when the algorithm starts 
in the Set capital T is empty, 
each of the vertices is by it's own, 
it's on its own isolated component. 
And then each time Kruskal's adds a new 
edge to the set capital T. 
What that does is it takes two current 
connected components and fuses them into 
a single connected component. 
So for example, toward the end of 
Kruskal's algorithm that maybe it's 
included enough edges, that now the Tree 
capital T that is constructed so far has 
only four different connected components. 
And maybe it's about to add a new edge U 
comma V, where of course U and V should 
be in different connected components with 
respect to the edges chosen for far. 
So this new edge addition at this 
iteration of Kruskal is going to fuse the 
connecting components of U and V into a 
single one. 
So that corresponds to taking the union 
of the groups to which U and V 
respectively belong. 
So to be a little more precise about it. 
So what are going to be the objects 
contained by the Union-Find Data 
Structure in Kruskal's algorithm? 
We're going to correspond to vertices. 
It's the vertices coalescing that we want 
to keep track of. 
So what are going to be the groups in the 
partition that we maintain? 
They're just going to correspond to 
connected components with respect to the 
edges that Kruskal's algorithm has 
already committed to. 
And with these semantics, it's clear that 
every time Kruskal adds a new edge to its 
set capital T, we have to invoke the 
union operation to fuse two connected 
components into one.