So in this sequence of videos, we're 
going to apply the greedy algorithm 
design paradigm to a fundamental graph 
problem, the problem of computing minimum 
spanning trees. 
The MST problem is a really fun 
playground for greedy algorithm design, 
because it's the singular problem in 
which pretty much any greedy algorithm 
you come up with seems to work. 
So we'll talk about a couple of the 
famous ones, show why they're correct, 
and show how they can be implemented 
using suitable data structures to be 
blazingly fast. 
So, I'll give you the formal problem 
definition on the next slide but first 
let me just say informally what it is 
we're trying to accomplish. 
Essentially, what we want do is connect a 
bunch of points together as cheaply as 
possible. 
And, as usual with an abstract problem 
the objects can mean something very 
literal. 
So maybe the points we're trying to 
connect are servers in some computer 
network, or it could represent something 
more abstract. 
Like maybe we have a model of documents 
like Web Pages where we represent them as 
points in space. 
And we want to somehow connect those 
together. 
Now the main reason I'm going to spend 
time on the minimum expenditure problem 
is pedagogical. 
It's just a great problem for sharpening 
your skills with greedy algoritum design 
and proof of correctness. 
It'll also give us another opportunity to 
see the beautiful interplay between data 
structures and fast limitation of graph 
algorithms. 
That said that minimum expenditure 
problem does have applications. 
One very cool one is in clustering, and 
that I'll talk about in detail in a later 
video, it also comes up in networking. 
So if you do a web search on spanning 
tree protocol you'll also find some 
information about that. 
So as I said at the beginning the minimum 
spanning tree problem is remarkable in 
that it doesn't just admit one greedy 
algorithm that's correct, but in fact it 
admits multiple greedy algorithms that 
are correct. 
we're going to talk about two of them, 
the two most well known ones. 
But there are even some others believe it 
or not. 
So the first one we're going to discuss 
beginning in the next video is Prim's MST 
algorithm. 
This dates back over 50 years to 1957. 
in fact as you'll see Prim's algorithm 
shows a remarkable number of similarities 
with Dijkstra's shortest path algorithm. 
So you might not be surprised to know 
that Dijkstra also independently had 
discovered this algorithm a couple of 
years later. 
But in fact it was only noticed much 
later that this exact same algorithm had 
been first discovered over 25 years 
earlier by a mathematician named Jarnick. 
For that reason you'll sometimes hear 
this called Jarnick's algorithm or the 
Prim-Jarnick algorithm. 
for gravity and to be consistent with 
some of the main text books in the area 
I'm just going to call this Prim's 
algorithm throughout the lectures. 
The other algorithm we're going to cover 
which is also rightfully famous is 
Kruskal's MST algorithm. 
As far as I know this was indeed first 
discovered by Kruskal roughly the same 
time as Prim was doing his algorithm in 
the mid 50s. 
And in what sense do I say these 
algorithms are blazingly fast? 
Well, they run in almost linear time, 
linear in the number of edges of the 
graph. 
Specifically we'll see how using 
appropriate data structures will get each 
of them to run in time big O of M log N, 
where M is the number of edges in the 
graph, and N is the number of vertices in 
the graph. 
We'll employ data structures to speed up 
Prim's algorithm in exactly the same way 
we did for Dijkstra's algorithm, that is 
we'll be using the heap data structure, 
One thing that's cool about Crystal's 
algorithm is it'll give us an opportunity 
to study a new data structure, mainly the 
union fine data structure and that's a 
lot of fun to think about, in its own 
right, as you'll see. 
So to put this amazing running time on 
perspective I want to emphasize that only 
is it awesome in the sense it's you know, 
barely, it's almost linear. 
It takes almost barely more time to 
compute the spanning tree than it does to 
read the input graph. 
Reading the input graph alone, remember 
would take linear time. 
O of M time. 
But more over, graphs can have an 
enormous number of spanning trees. 
An exponential number. 
So some of these algorithms are honing in 
really quickly on a needle in a haystack. 
There's no way they have time to look at 
all these spanning tees, and yet they 
find the one which is the best which is 
optimal amongst all of them. 
How do these seemingly magical algorithms 
do it? 
Well, to discuss the details let's start 
by formalizing the Minimum Spanning Tree, 
or MST problem on the next slot. 
So in the MSD problem this is a graph 
problem so the main part of the input is 
a graph comprising verticies and edges. 
I do want to emphasize for the MST 
problem we are be considering only 
undirected graphs. 
This is different notice, than when we 
discussed shortest-path problems in Part 
one of the course. 
There we worked with directed graphs. 
There is an analogous problem to the 
[INAUDIBLE] signature problem for 
directed graphs. 
It's often called the optimal branching 
problem. 
And there are fast algorithms for it, but 
those algorithms are just slightly beyond 
the scope of this course. 
So we're not going to cover it. 
We're going to discuss only undirected 
graphs, and then minimum spanning trees 
for them. 
Now, whenever you talk about graph 
problems, you need to talk about, how is 
the graph actually represented. 
So that's something we discussed at 
length in part one. 
If you don't remember, I suggest going 
back and reviewing the video on graph 
representations. 
For the MST problem, we're going to 
assume that the graph is given as an 
adjacency list. 
That means, we're given an array of 
vertices, an array of edges. 
And we have pointers, wiring vertices to 
their incident edges and wiring edges 
back to their two endpoints. 
In addition to the graph of self the 
input includes a cost, for each of the 
edges, we're going to use the notation C 
sebies of note the cost of a edge, E. 
And in another contrast, to are 
discussion of shortest path problems, 
we're actually not going to care if the 
edge cost are positive or negative, they 
can be any number whatsoever. 
So no prizes for guessing what the 
outputs supposed to be, it's right there 
in the problem definition, the output is 
supposed to be a minimum cost spanning 
tree of the graph, but let's drill down 
and explain exactly what we mean by that. 
So first of all what do we mean by the 
cost of a tree or generally the cost of a 
sub graph, as a subset of the edges. 
Well we're just going to be looking at 
summing up the edges in the tree that we 
output. 
Now the other question is what do I mean 
by a tree that spans all vertices? 
So let me tell you exactly what this 
means, the sub graph T should have two 
properties, first of all there can not be 
any cycles, there can not be any loops in 
this tree. 
And by spanning all vertices, what I mean 
is that this sub graph is what's called 
connected. 
That is, there's a path, using the edges 
and t, from any vertex of the graph to 
any other vertex. 
That's what it means to span all of the 
vertices. 
So for example, consider the following 
graph with four vertices and five edges. 
I've labeled each of the five edges with 
a cost, which in this case, is just an 
integer between one and five. 
So, let's look at some example subgraphs, 
let's start with the three edges, A, B, 
B, D and CD. 
This sub-graph satisfies properties one 
and two. 
That is, it has no cycles, there's no 
loops and it spans all of the vertices. 
If you start at any one of these four 
vertices, you can get to any of the other 
four vertices by using only red edges. 
So in that sense, this red sub-graph is a 
spanning tree. 
However, it is not the minimum cost 
spanning tree. 
There is another spanning tree which is 
even cheaper, has a smaller sum of edge 
costs, namely the edges AC, AB, and BD. 
This also has no cycles and it's also 
connected but the sum of the edge cost is 
only seven, smaller than the eight of the 
previous spanning tree. 
In fact, this pixograph is the unique 
minimum spanning tree of this graph. 
There is a sub graph that has three edges 
which has an even smaller sum, of edge 
costs, namely the triangle AB, BD and AD. 
But this light blue sub graph, this 
triangle, is not a spanning tree. 
In fact, it fails on both counts. 
It does obviously have a cycle. 
It has a loop. 
That's, what it is by definition. 
It's also not connected, so there's no 
way to get from C, the vertex, to any of 
the other three vertices by following 
only light blue edges. 
It's disconnected, and so it fails 
property one as well. 
So the MST problem in general is you're 
given it under a graph, like, for 
example, this four note, five edge graph, 
or presumably. 
something much larger and an interesting 
problem and your suppose to quickly 
identify the minimum spanding tree like 
in this example the pink subgraph. 
So what I want to do next is something 
you're probably quite accustomed to me 
doing by this point, is I want to make a 
couple of mild simplifying assumptions 
just among friends. 
So these assumptions are not important in 
the sense that all of the conclusions of 
these lectures will remain true, will 
remain valid even if these assumptions 
are violated but it'll make the lectures 
a little bit easier. 
It'll allow us to focus on the main 
points and not get distracted by less 
relevant details so here are the two 
assumptions that we're going to make 
throughout all of the lectures on minimum 
spanning trees. 
The first assumption we're going to make 
is that the input graph G is itself 
connected. 
That is G contains a path from any vertex 
to any other vertex. 
So why am I making this assumption? 
Well if this assumptions violated then 
the problem isn't even well defined. 
If the graph isn't connected then 
certainly none of it's subgraphs are 
connected so it has no spanning trees and 
it's not clear what we're trying to do. 
So, those of you who still remember the 
stuff we covered in part one in 
particular, graph search. 
Should recognize that this condition's 
easy to check in a pre-processing step. 
Just run something like breadth first 
search or depth first search. 
Remember, we know how to implement those 
in linear time. 
And those will, in particular, tell you 
whether or not the input graph is 
connected. 
Now, another thing you might be wondering 
is, suppose it was disconnected. 
Then what? 
Should be really just sort of throw up 
our hands and give up? 
You can define a version of the minimum 
spanning tree problem. 
A more general one called minimum 
spanning forest. 
Where, basically you want the minimum 
cost sub graph that spans as much stuff 
as possible. 
Essentially, it's responsible for 
computing a spanning tree within each of 
the connected components of the original 
graph. 
And using the algorithms I'll show you 
here, Prim's algorithm, Kruskal's 
algorithm, they're easily modified to 
solve the more general problem with 
disconnected input graphs as well. 
But again, for simplicity among friends, 
let's just focus on the connected graph 
case that contains all of the main ideas. 
Our second standing assumption throughout 
all of the minimum of spanning tree 
lectures will be that in the input graph 
the edge costs are distinct. 
So you're already use to this sort of no 
ties kind of assumption from our foray 
into scheduling algorithms, and we're 
going to do something similar here. 
Now again this assumption is not 
important in the sense that the 
algorithms that we cover prims algorithm 
crustgrals algorithm. 
They remain correct even if the input has 
equal cost edges, irrespective of how 
ties are broken. 
So the algorithms are correct as widely 
as you would want. 
That's it. 
I'm not going to actually prove for you 
that they are correct with ties. 
Remember we had our scheduling, 
application it was a little bit easier to 
get a proof of correctness without ties, 
I gave you that, and then optionally 
there was a slightly more complicated 
argument that handled ties. 
You can do the same thing here, but I'm 
just not going to give it to you. 
I'll leave that for the keen viewer to 
work out for themselves.