In this video we'll discuss how we
actually implement Dijkstra's shortest
path algorithm. And in particular, using
the heap data structure, we'll give a
blazingly fast implementation, almost
linear time. Let me just briefly remind
you of the problem we're solving. It's the
single source, shortest path problem. So
we're given the directed graph and a
source vertex, S. We're assuming that
there's a path from S to every other
vertex, V. If that's not true, we can
detect it with an easy pre-processing
step, so our task then is just to find the
shortest path amongst all of them from the
source vertex, S to each possible
destination, V. Moreover, every edge of
the graph has a non-negative edge length
which we're denoting, else of V. So recall
that Dijkstra's algorithm is driven by a
single Y loop. So we're going to add one
additional vertex to an evolving set
capital X as the algorithm proceeds. So X
is the vertices that have been processed
so far. We maintain the invariant that for
every processed vertex we've computed what
we think the shortest path distance is to
that vertex. So initially X is just the
source vertex S. Of course, the shortest
path distance from S to itself is zero.
And then the cleverness of Dijkstra's
algorithm is in how we figure out which
vertex to add to the set capital X each
iteration. So the first thing we do is we.
Focus only on edges that cross the
frontier edges that have their tail in
capital X and their head outside of
capital X. Now, of course, there may be
many such edges, Edges that cross this
frontier and we use Dijkstra's Grady
criterion to select one of them. So for
each crossing edge, each edge with a tail
and X and head outside X, we compute the
Dijkstra Grady score that is defined as
the previously computed shortest path
distance to the tail of the arc plus the
length of the arc. So we compute that for
each crossing edge and then the minimum
edge we're calling it V star W star. That
determines how we proceed. So we add the
head of that arc W star to the set capital
X and then w e compute the shortest path
distance to W star to give the previous.
See computed shortest fast distance to V
Star plus the length of this extra
[inaudible] V Star, W Star. Now back when
I explained this algorithm I did it using
two arrays, array capital A and array
capital BA Is what computed the shortest
path distances, and remember that's what
the problem asks us to compute. And for
clarity I also filled up this array
capital B just to keep track of the
shortest paths themselves. Now if you look
at the code of this algorithm, we don't
actually need the array capital B for
anything. When we fill in the array
capital A, we don't actually refer to the
B array. And so now that we're gonna talk
about real implementations of Dijkstra;
I'm actually gonna cross out all of the
instructions that correspond to the B
array. Okay? Because you would not, as I
told you earlier, use this in a real
implementation of Dijkstra. You would just
fill in the shortest path distances
themselves. So in the next quiz, what I
want you to think about is the running
time of this algorithm if we implemented
it more or less as is, according to the
pseudo code on this slide without any
special data structures. And in the
answers to the quiz, we're going to be
using the usual notation where M denotes
the number of edges in the graph, and N
denotes the number of vertices of the
graph. So the correct answer to this quiz
is the fourth one that the straightforward
implementation of Dijkstra's algorithm
would give you a running time proportional
to the product of the number of edges and
the number of vertices. And the way to see
that is to just look at the main while
loop and look at how many times it
executes and then how much work we do per
iteration of the while loop if we
implemented it in a straightforward way.
So there's gonna be N minus one iterations
of the while loop. And the reason is, is
that the algorithm terminates once every
single vertex has been added to capital X.
Their end vertices, initially there's one
vertex in X. So after N m inus one
iterations, we'll have sucked up all of
the vertices. Now what's the work done in
each wild loop? Well basically, we do
naively a linear scan through all of the
edges. We go through the edges. We check
if it's an eligible edge, that is if its
tail is in X and its head is outside of X.
We can keep track of that just by having
an auxiliary bullion variable for each
vertex. Remembering whether it's an X or
not. And then amongst all of the illegible
edges, those crossing the frontier, we
just, by exhaustive. Search remember which
edge has the smallest Dijkstra store- score,
now we can compute the Dijkstra score in
constant time for each of the edges. So
that's a reasonable algorithm. We might be
able to get away with graphs that have say
hundreds or thousands of vertices using
the straight forward of implementation,
but of course, we'd like to do better.
We'd like the algorithm to scale up to
much larger graphs, even graphs with
potentially say a million vertices So the
answer is yes, we can do better. Not by
changing the algorithm, but, rather,
changing how we organize the data as the
algorithm proceeds. So this will be the
first time in the course where we use a
data structure to get an algorithmic
speed-up. So we're gonna see a really
lovely interplay between, on the one hand,
algorithm design and, on the other hand,
data structure design in this
implementation of Dijkstra's algorithm. So
you might well ask what's the clue that
indicates that a data structure might be
useful in speeding up Dijkstra's shortest
path algorithm. And the way you'd figure
this out is you'd say, well, where is all
this work coming from? Why are we doing a
linear amount of work in the edges for a
linear number in the vertices iterations?
Well, at each iteration of this while
loop, what we're doing is, we're just
doing an exhaustive search to compute a
minimum. We look at every edge, we look at
those that cross the frontier, and we
compute the one with the minimum Dijkstra
score. So we could ask ourselves, oh, if
we're doing minimum comp utations over and
over and over again, Is there some data
structure which, whose raison d??tre,
whose reason for being is in fact to
perform fast minimum computations? And in
fact there is such a data structure. It's
the heap data structure. So in the
following description of a fast
implementation of Dijkstra's algorithm, I'm
going to assume you're familiar with this
heap data structure. For example, that you
watched the review video elsewhere on the
course site that explains it. So let me
just remind you with a lightning-quick
review of what we learned in that video,
So heaps are generally logically thought
of as a complete binary tree, even though
they are usually implemented as a laid-out
linear array. And the key property that
you get to leverage but that you also have
to maintain in a heap is the heap property
that at every node the key at that node
has to be at least as small as that of
both of the children. This property
ensures that the smallest key of them all
has to be at the root of this tree. To
implement extract menu, just pluck off the
roots. That's what you return. That's the
minimum element. And then you swap up the,
bottommost rightmost leaf. The last
element, Make that the new root, And then
you bubble that down as necessary to
restore the heap property. When you do
insertion, you just make the new element
the new last leaf, bottommost rightmost
leaf, and then you swap up as needed to
restore the heap property. When we use
heaps in Dijkstra's algorithm we're also
going to need the ability to delete an
element from the middle of the heap. But
again you can do that just by swapping
things and doubling up or down as needed.
I'll leave it as an exercise for you to
think through carefully, how to delete
elements from the middle of a heap.
Because you're maintaining the heap as an
essentially perfectly balanced binary
tree, the height of the tree is roughly
the log base two of N, where N is the
number of elements in the heap. And
because for every operation, you implement
it just by doing a constant amo unt work
at each level of the tree, all of these
operations run in O of log N time, where N
is the number of items that are being
stored in the heap. As far as the
intuitive connection between the heap data
structure and Dijkstra's Algorithm. In the
main wild loop of Dijkstra's Algorithm,
we're responsible for finding a minimum,
every single iteration. What are heaps
good for? They're good for finding
minimums in logarithmic time. That sounds
a lot better than the linear time we're
spending in the naive implementation of
Dijkstra's Algorithm. So let's now see how
to use heaps to speed up Dijkstra's
shortest path algorithm. Now because every
iteration of the wild loop is responsible
for picking an edge, you might expect that
we're going to store edges in the heap. So
the first subtle but really good idea is
to actually use a heap to store vertices
rather than edges. Going back to the
pseudo-code [inaudible] algorithm,
remember that the only reason that we
focused on an edge. Well so that we can
then deduce which vertex, namely the head
of that edge, to add to our set capital X.
So we're just going to cut to the chase,
we're just going to keep vertices not yet
in X and then when we extract them in from
the heap, it'll tell us which is the next
vertex to add into the set capital X. So
the picture we're going to wanna have in
mind is Dijkstra's choice path algorithm
at some intermediate iteration. So
there'll be a bunch of vertices in the,
the set capital X source vertex plus a
bunch of other stuff that we've sucked
into the set so far. And then there'll be
all the vertices we haven't processed yet.
A big group V minus X. Then there's gonna
be edges crossing this cut in both
directions from X to V minus X and vice
versa. Now before I explain the second
invariant, let's just recall what the
straightforward implementation of Dijkstra's
Algorithm needs to do. What it would do is
search through all the edges and it would
look for any eligible edges. Those with
tail and X, and head and V minus X. So in
this picture, there w ould be three such
edges. I've drawn the example so that two
of the edges, the top two edges, both
share a common head vertex whereas the
third edge has its own head vertex. The
straightforward of limitation of Dysktra's
Algorithm we?d compute Dystra's greedy
score for each of these three edges And
remember, by definition, that's the
previously computed shortest path distance
to the tail of the arc V, plus the length
of the arc VW. So the straightforward
implementation just computes this. In this
case, it would compute it for three edges.
And whichever the three edges won had the
smallest score. The head of that edge
would be the next vertex that gets added
to X. So let me specify the second
invariant, and then I'll tell you how to
think about it. So, because we're storing
vertices rather than edges in the heap,
we're going to have to be fairly clever
with the way we define the key of a vertex
that's in this heap. [sound] So we're
going to maintain the property that the
key of a vertex V is the smallest greedy
Dijkstra score of any ver, any edge which
has that vertex as its head. So let me
show you what I mean in terms of our
example, where we have three crossing
edges. Suppose for these three edges in
the upper right that happen to have of
Dijkstra [inaudible] scores of seven,
three, and five. Let's look at what the
key should be for each of these three
vertices I've drawn in V minus X. Now for
the timeline vertex this is pretty
interesting. There are two different edges
whose tail is in X, and have this vertex
as their head. So what should the key of
this vertex be? Well, it should be the
smallest Dijkstra greedy score of any of
the edges whose tail lies on the left-hand
side that terminate at this vertex. So
there's two candidate edges. One has
Dijkstra greedy score three. One has
Dijkstra greedy score seven. So the key
value should be three, the smaller of
those two. Now, the second vertex, there's
only a single edge that has tail in X and
that terminates at this vertex. So the key
for this vertex should jus t be the score
at that weak edge. So in this case that's
gonna be five. And then this poor third
vertex, there's actually no edges at all,
that, started X and terminated at this
vertex. There's only one arc going the
wrong direction. So for any edge, sorry,
for any vertex outside of X that doesn't
have any eligible edges terminating at it,
we think of the key as being plus
infinity. So the way I recommend thinking
about these heap keys is that we've taken
what used to be one round tournament,
winner takes all And we've turned it into
a two round knockout tournament. So in our
straightforward implementation of
Dijkstra's algorithm, we did a single
linear search through all the edges, and
we just computed the [inaudible]
Dijkstra's score for each and we picked
the best. So in this example we would have
discovered these three edges in some
order. Their scores are three, five, and
seven. And we would have remembered the
edge with score three as being the best.
That would have been our winner of this
winner take all tournament. Now when we
use the heap, it, we're factoring it into
two rounds. So first, each vertex in V
minus X runs a local tournament. To elect
a local winner, so each of these vertices
in V minus X. Says, well let me look at
all of the edges. For whom I'm the head
and also the tail of that edge is in X.
And amongst all of those edges that start
in X and terminate at me, I'm going to
remember the best of those. So that's the
winners of the local tournament of the
first round. And now the heap is only
going to remember this set of first round
winners. Right, there's no point in
remembering the existence of edges who
aren't even the smallest score that
terminate at a given vertex, because we
only care about the smallest score
overall. Now when you extract min from the
heap, that's in effect. Executing the
second and final round of this knockout
tournament. So each of the vertices of V
minus X has proposed their local winner.
And then the heat in an extract min just
chooses the best of all of those local
winners. So that's the final proposed
vertex that comes out of the heap. So the
point is that if we can successfully
maintain these two invariants, then, when
we extract min from this heap, we'll get
exactly the correct vertex, W star, that
we're supposed to add to the set capital X
next. That is, the heap will just hand to
us on a silver platter exactly the same
choice of vertex that our previous
exhaustive search through the edges
would've computed. The exhaustive search
was just computing the minimum in a brute
force way, in a single winner take all
tournament. The heap implemented in this
way chooses exactly the same winner. It
just does it in this 2-round process. Now,
in Dijkstra's algorithm, we weren't
supposed to merely just find the vertex W
star to add to X. We also had to compute
its shortest path distance. But remember,
we computed the shortest path distance as
simply the Dijkstra greedy score. And here
the Dijkstra greedy score is just going to
be the key for this heap that's immediate
from invariant number two. So we're using
the fact here that our keys are, by
definition, just. The smallest greedy
scores are edges that stick into that
vertex W STAR so again exactly
replicating. The computation that we would
have done in the straightforward
implementation, just in a much slicker
way. Okay? But we're adding exactly the
same vortices, in exactly the same order,
and we're computing exactly the same
shortest path distances in this heap of
notation, provided of course that we do
successfully maintain these two invariants
throughout the course of the algorithm. So
that is now what I owe you. We have to pay
the piper. We've shown that if we can have
a data structure with these properties.
Then we can simulate the straight forward
implementation now I have to show you how
we maintain these invariants without doing
too much work. All right. So maintaining
invariant number one will really take care
of itself. Really sort of by definition
the vertices which remain in the heap are
those that we haven't process ed yet, and
those are the ones that are outside of
capital X. So really the trick is, how do
we maintain invariant number two? Now
before I explain this let me point out,
that this is a tricky problem. There is
something subtle going on. So as usual, I
want you to think about this shortest path
algorithm at some intermediate iteration.
Okay? So take a, take a snapshot. A bunch
of vortices have already been added to X.
A bunch of vortices are still hanging out
in the heap. They haven't been added to X.
There's some frontier, there's a, just
crossing, possibly in both directions. And
suppose at the end of a current iteration
we identify the vortex W, which we're
going to extract from the heap and
conceptually add to the set X. Now the
reason things complicated is when we move
a vortex from outside X to inside X. The
frontier between X and V minus X changes.
So in this picture, the old black X
becomes this new blue X. And what's really
interesting about the frontier changing is
that then the edges which cross the
frontier change. Now, there might be,
there are some edges which used to cross
the frontier and now don't. Those are the
ones that are coming into W. Those we're
not so concerned with. Those don't really
play any role. What makes things tricky is
that there are edges which used to not be
crossing the frontier but now they are
crossing the frontier. And those are
precisely the edges sticking out of W. So
in this picture there are three such edges
which I will highlight here in pink. To
see why it's tricky when new edges all the
sudden are crossing a frontier let's
remember what invariant number two says.
It says that for every vertex which is
still in the heap, which is not yet in X,
the key for that vertex better be The
smallest Dijkstra Grady score of any edge
which comes from capital X and sticks into
this vertex of V. Now in moving one vertex
into X, namely this vertex W, now there
can be new edges sticking into vertices
which were still on the heap. As a result,
the appropriate key value for vertices i n
the heap might be smaller. Now the W has
been moved into X. And the candidates for
the vertices in the heap whose keys might
have dropped are precisely those vertices
on the other end of edges sticking out of
W. So summarizing, the fact that we'd
added a new vertex to capital X and
extracting something from the heap, it's
potentially increased the number of
crossing edges across the frontier,
because the frontier has changed. And
therefore, for vertices that remain in the
heap, the smallest greedy score of an edge
that sticks into them from the set X might
have dropped. So we need to update those
keys to maintain invariant number two.
Now, that's the hard part. Here's what we
have going for us. We've damaged the keys
perhaps by changing the frontier, but the
damage is local. We can understand exactly
whose keys might have dropped, so as
suggested by the picture, the vertices
whose keys we need to update are precisely
those at the head of edges that stick out
of W. So for each outgoing edge from W,
the vertex we just extracted from the
heap, we need to go to the other end of
the edge and check if that vertex needs
its key to be decreased. So here's the
pseudo code to do this. So when we extract
the vertex W from the heap, that is when
we conceptually add a new vertex W. To the
set X, thereby changing the frontier, we
say, well, you know, we know the only
vertices that might have to have their key
changed, they're the ones on the other
side of these outgoing arcs from W. So we
just have a simple iteration over the
outgoing edges, W V, from the vertex V.
Now I haven't shown you any edges in the
picture like this, but there might well be
some edges where the head of the arc V is
also in the set X, is also already been
processes. But anything in X is not in the
heap. Remember, the heap is only the stuff
outside of X. So we could care less about
the stuff outside. Of the heat, for not
maintaining their keys. So we do an extra
check. If the head of this edge is in fact
still in the heap, that is if it's not in
X So i n the picture, for example, this
would be true for all three of the
vertices that are on the other end of arcs
pointing out of W. And for each of these
vertices V, we update its key. And the way
we're going to update its key is, we're
just going to rip this vertex out of the
heap. We're going to recompute its key and
constant time, and then we're going to
reinsert it into the heap. And since all
heap operations take logarithmic time,
this key update will be logarithmic time.
As an additional optimization, I wanna
point out that if one of these vertices
V's key does change, it can only change in
one way. So remember, what is the key? The
key is the smallest Grady Dijkstra score
of all of the edges that start next and
stick into this vertex. So that's the
local tournament or the first round
tournament happening at this vertex V. Now
the only thing which has changed. Before
and after we added this vertex, W to X, is
that now one new edge is sticking into
this vertex, V. All of the old edges
sticking into it from X are still sticking
into it, and now there's one extra
candidate in its local tournament, namely
this edge, WV. So either WV is the local
winner; either it has the smallest
Dyxtra-Greedy score of them all. That
terminated this vertex, or it doesn't, in
which case the previous winner is still
the new winner. So if that is, the new key
value can only be one of two things.
Either it's the old key value--that's the
case where this. Extra entrance, the edge
from W to V is irrelevant. Or, if it's
changed, it has to have changed to the
[inaudible] score of this edge, W-V. And
the formula for that is the shortest path
distance. That we just computed for W
where W has been processed at this point
plus the link of the direct arch from W M
V. And again conceptually this formula is
just a greedy Dijkstra score for the arc
WV. The new entrance in V's local first
round tournament. So now, having updated
V's key appropriately, so that invariant
#two is restored. And once again, the key
of every vertex does reflect the sma llest
greedy, Dijkstra greedy score of any edge
sticking into it from the set X. We can
safely reinsert this node back into the
heap with its new key value. And these
three lines together are just a key update
in logarithmic time, for one of these
vertices that's at the other end of an arc
sticking out of the vertex W. So let's
tally up the running time in this new
implementation. One thing I want you to
check, and this will definitely help you
understand this refined implementation of
Dijkstra's algorithm, is that essentially
all the work done is through the heap API.
That is, all of the running time that we
have to account for is in heap operations.
We don't really do nontrivial work outside
of heap operations. And again recall that
the running time of any heap operation is
logarithmic in the number of elements in
the heap. Our heap is storing vertices.
It's never gonna have more than N things
in it. So the running time of every heap
operation is big O of log N. So what are
the heap operations that we do. Well, we
extract men and we do it once per
iteration of the wild loop. So there's N
minus one iterations of the wild loop,
just like before, but now instead of doing
an exhaustive search through the edges, we
just do a simple extract men from the heap
and it gives us on a silver platter the
vortex we should add next. So what do we
do beside extract mins? Well, we have to
do this work paying the piper. We have to
maintain invariant #two. And every time we
extract a min, that then triggers some
subsequent key updates. And remember, each
of these key updates is a deletion of an
element, from the heap followed by an
insertion. So how many deletions and
insertions do we do? Well, at first this
might seem a little bit scary. Right?
Because we do a roughly linear number of
extract mins. And a vertex might have as
many as N-1 outgoing arcs. So it seems
like a vertex could trigger as many as N-1
key updates, which is theta of N
[inaudible] operations. And if we sum that
up over the N iterations of the wild loop
that w ould give us N squared heap
operations. So, and indeed, in dense
graphs, that can be the case. It is true
that a single vertex might trigger a
linear in N [inaudible] number of
[inaudible] operations. But that's the
wrong way to think about it. Rather than
have this vertex-centric perspective on
what, who's responsible for heap
operations, let's have an edge-centric
view. So for each edge at the graph, let's
think about when can this be responsible
for some heap operations, in particular a
decrease in key in the resulting insertion
and deletion. If you have an edge and it
points from the vertex V to the vertex W.
There's actually only one situation in
which this edge is going to be responsible
for a, a decrease in key. And that's in
the case where the tail of the edge, V.
Gets sucked into the set X before the head
W of this edge gets sucked into the set X.
If that happens, if V gets sucked into X
and W is still outside of X, then indeed
we're gonna have to decrease the key of W,
just like we did in the examples. But
that's all that's gonna happen: V can only
get sucked into X once and never gonna
leave it. So it's only responsible for
this single decrease in key of its head W.
And that's one insertion and one deletion.
And in fact, if the endpoints of this edge
get sucked into X in the opposite order,
if the tail of, excuse me, if the head of
this edge W gets sucked into X first. That
doesn't even trigger a, a key decrease for
V, and V will never have its de key
decreased, because of this particular arc,
from V to W. So the upshot is that each
edge VW of the graph triggers at most one
insert delete combo. So what does this
mean, this means that the number of heap
operations. Is big O of N, that's for the
extract mins. Plus big O of M. That's for
the insert the leak combos triggered by
edges during the decreased keys. Now just
to, I'm gonna write this in a, in a
simplified way. This is just O of M, the
number of edges. And this is because of
our assumption that's there's a path to s
from every other vertex. If yo u think
about it that means that the graph is at
least weakly connected if you picked it up
it would stay together in one piece. So
that means it at least contains a tree, at
least an in an undirected sense, which
means it contains at least N minus one
edges. So we're in the case of weakly
connected graphs where N dominates M. M is
always as big as N at least up to a plus
one. So what that means is the running
time of Dijkstra's algorithm, with this
heap implementation, is just a log factor
larger. Remember, every heap operation
takes time logarithmic. So we do a linear
in M number of operations; each takes time
logarithmic in N. So the running time is M
log N. With, I should say, quite good
consistence. So this is a really, really
impressively fast algorithm, for computer
such a useful problem as shortest paths.
So we got a little bit spoiled in our
discussion of graph searching
connectivity, where it seemed any problem
we cared about we could solve in linear
time, over M plus N. So here we're picking
up this extra logarithmic factor, but I
mean, come on, this is still awesome. A
running time of M log N is unbelievably
faster than a running time of M times N,
which is what we had in the
straightforward implementation. So this
deft use of the heap data structure has
given us a truly blazingly fast algorithm
for an extremely well motivated problem,
computing shortest paths.