We've arrived at another one of computer 
science's greatest hits, namely 
Dijkstra's shortest path algorithm. 
So let me tell you about the problem. 
It's a problem called single-source 
shortest paths. 
Basically what we want to do is compute 
something like driving directions. 
So we're given as input a graph, in this 
lecture I'm going to work with directed 
graphs. 
Although the same algorithm would work 
for undirected graphs with cosmetic 
changes. 
As usual we'll use N to denote the number 
of edges and N to denote the number of 
vertices. 
The input also includes two extra 
ingredients. 
First of all for each edge E, we're 
giving as input a non negative length 
which also denotes by else of V and the 
context of a driving direction 
application else of V could also denote 
to the mileage. 
How long, this particular road is or it 
could also denote, the expected travel 
time along the edge. 
The second ingredient is a vertex from 
which we, are looking for paths. 
This is exactly the same as we had in 
breadth first search and depth first 
search. 
We have an originating vertex, which 
we'll call here, the source. 
Our responsibility then is to, given this 
input, compute for every other vertex V, 
in this network. 
The length of the shortest path from the 
source vertex S, to that destination 
vertex, V. 
And so just to be clear, what is the 
length of a path that has, say, three 
edges in it? 
Well, it's just the sum of the length of 
the first edge in the path plus the 
length of the second edge in the path 
plus the length of the third edge in the 
path. 
So if you had a path like this with three 
edges and lengths one, two and three, 
then the length of the path would just be 
six. 
And then we define the short SV path in 
the natural way. 
So, amongst all the path directed from S 
to V, each one has it's own respected 
path lengths, and then the minimum over 
all SV paths is the shortest path 
distance in the graph G. 
So, I am going to make two assumption for 
these lectures one is really just for 
convenience the other is really 
important. 
The other assumption without which 
Dijkstra's algorithm is not correct, as 
we'll see. 
So, for convenience, we'll assume that 
there is a directed path from S to every 
other vertex V in the graph. 
Otherwise, the shortest path distance is 
something we'd define to be plus 
infinity. 
And the reason this is not a big 
assumption is, if you think about it, you 
could detect which vertices are not 
reachable from S just in a preprocessing 
step using say, breadth-first or 
depth-first search. 
And then you could delete the irrelevant 
part of the graph and run Dijkstra's 
algortihm as we'll describe it on what 
remains. 
Alternatively, Dijkstra's Algortihm will 
quite naturally figure out which vertices 
there are paths to from S, and which ones 
there are not. 
So this won't really come up. 
So, to keep it simple, just think about 
we have an input graph, where you can get 
from S to V, for every different vertex 
V. 
And the challenge then is to amongst all 
the ways to get from S to V, what is the 
shortest way to do it. 
So the second assumption already appears 
in the problem statement, but I want to 
reiterate it just so it's really clear. 
When we analyze Dijkstra's algorithm we 
always focus on graphs where every length 
is non negative. 
No negative edge links are allowed and 
we'll see why a little bit later in the, 
in the video. 
Now in the context of a driving 
directions application, it's natural to 
ask the question, why would you ever care 
about negative edge lengths?" Until we 
invent a time machine, it doesn't seem 
like negative edge lengths are going to 
be relevant when you're computing literal 
paths, through literal networks, but 
again remember that paths can be thought 
of as more abstractly as a just sequence 
of decisions. 
And some of the most powerful 
applications of shortest paths are. 
Coming up with optimal ways such 
sequences. 
So, for example, maybe you're engaging in 
financial transactions and you have the 
option of both buying and selling assets 
at different times. 
If you sell, then you get some kind of 
profit and that would correspond to a 
negative edge link. 
So there are quite interesting 
applications in which negative edge links 
are relevant. 
If you are dealing with such an 
application, Dykstra's Algorithm is not 
the algorithm to use. 
There's a different shortest path 
algorithm. 
A couple other ones but the most well 
known one is called Bellman-Ford. 
That's something based on dynamic 
programming, which we may well cover in a 
sequel course. 
Okay? 
So, for Dijkstra's Algorithm, we always 
focus on graphs that have only 
non-negative edge lengths. 
So with the next quiz I just want to make 
sure that you understand the single shore 
shortest path problem. 
Here let me draw for you here a simple 
four note network and ask you for what 
are the four shortest path lengths so 
from the source vertex X to each of the 
verticies in the network. 
Alright, 
so the answer to this quiz is the final 
option 0136. 
to see why that's true. 
Well, 
All of the options had zero as the 
shortest path distance from S to itself 
so that just seemed kind of obvious. 
So the empty path will get you from S to 
itself and have zero length. 
Now, suppose you wanted to get from S to 
V. 
Well, there's actually only one way to do 
that. 
You have to go along this one hop path. 
So the only path has length one. 
So the shortest path distance from SW is 
one. 
Now W is more interesting. 
There is a direct one hop path SW that 
has link four but that is not the 
shortest path from S to W. 
In fact the two hop path that goes 
through V as an intermediary has total 
path link three which is less than the 
link of the direct art from S to W. 
So therefore the shortest path distance 
from S to W is going to be three. 
And finally for the vertex T there's 
three different paths going from S to T, 
there's the two hop path that goes 
through V, there's the two hop path which 
goes through W, both of those have path 
link seven. 
And then there's the three hop path which 
goes through both V and W and that 
actually has a path length of 1 + 2 + 3 = 
6. 
So despite having the largest number of 
edges the zigzag path is in fact the 
shortest path from S to T and it has 
length six. 
Alright so before I tell you how 
Dijkstra's algorithm works I feel like I 
should justify the existence of this 
video a little bit alright because this 
is not the first time we've seen shortest 
paths you might be thinking rightfully so 
we already know how to compute shortest 
path's that was one of the applications 
of Breadth-first search. 
So the answer to this question is both 
yes and no. 
Breadth-first search does indeed compute 
shortest paths we had an entire video 
about that but it works only in the 
special case where the length of every 
edge of the graph is one. 
At the moment we're trying to solve a 
more general problem. 
We're trying to solve shortest paths. 
When edges can have arbitrary non 
negative edge lengths. 
So for example, in the graph that we 
explored in the previous quiz, if we ran 
Breadth-first search, starting from the 
vertex S. 
It would say that the shortest path 
distance from S to T is 2 and that's 
because there's a path with two hops 
going from S to T. 
Put differently, T is in the second 
layer, emanating from S but as we saw in 
the quiz, there is not in fact, a 
shortest two hop path from S to T if you 
care about the edge lengths. 
Rather, the minimum length path, the 
shortest path, with respect to the edge 
weights, is this three hop path which has 
a total length of six, so Breadth-first 
search is not going to give us what we 
want, when the edge lengths are not all 
the same. 
And if you think about an application 
like driving directions then needless to 
say it's not the case that every edge in 
the network is the same. 
Some roads are much longer than others. 
Some roads will have much larger travel 
times than others. 
So, we really do need to solve this more 
general shortest path problem. 
Similarly, if you're thinking more 
abstractly about a sequence of decisions 
like financial transactions in general, 
different transactions will have 
different values. 
So, you really want to solve general 
shortest paths and you're not in the 
special case that Breadth-first search 
solves. 
Now, if you're feeling particularly sharp 
today, you might have the following 
objection to what I just said. 
You might say big deal. 
General edge weights, unit edge weights, 
it's basically the same. 
Say you have an edge that has length 
three. 
How is that fundamentally different than 
having a path with three edges, each of 
which has length one? 
So why not just replace all the edges 
with a path of edges of the appropriate 
length? 
Now we have a network in which every edge 
has unit lengths and now we can just run 
Breadth-first search. 
So, put succinctly, isn't it the case 
that computing shortest path with general 
edge weights reduces to computing 
shortest paths with unit edge weights? 
Well, the first comment I want to make is 
I think this would be an excellent 
objection to raise and indeed as 
programmers, as computer scientists, this 
is the way you should be thinking. 
If you see a problem that seems 
superficially harder than another one you 
always want to ask well, maybe just with 
a, with a clever trick I can reduce it to 
a problem I already know how to solve. 
That's a great attitude in general for 
problem solving and indeed, you know if 
all of the edge links were just small 
numbers like one, two and three and so 
on, this trick would work fine. 
The issue is when you have a network 
where the different edges can have very 
different lengths. 
And that's certainly the case in many 
applications. 
definitely road networks would be one, 
where you have both, sort of long 
highways, and you have neighborhood 
streets. 
And potentially, in financial transaction 
based networks you would also have a wide 
variance between the value of different 
transactions. 
And the problem, then, is some of these 
edge links might be really big. 
There might be 100, 
there might be 1,000. 
It's very hard to put bounds on how large 
these edge weights could be. 
So if you start once in we're replacing 
single edges with these really long paths 
of with the thousand, 
you've blown up the size of your graph 
way too much. 
So you do have a faithful representation 
of your old network but it's too 
wasteful. 
So your going to end up Breadth-first 
search in a linear time it's now on this 
much larger graph and we much prefer 
something which is linear time or almost 
linear time that works directly on the 
original graph. 
And that's exactly... 
That Dijkstra's shortest path algorithm 
is going to accomplish. 
Let's now move onto the pseudo code for 
Dijkstra's shortest path algorithm. 
So this is another one of those 
algorithms, where, no matter how many 
times I explain it, it's always just 
super fun to teach. 
And the main reason is, because it 
exposes the beauty that pops up, in good 
algorithm design. 
So, the pseudo code, as you'll see in a 
second is, itself, very elegant. 
We're just goin got have one loop and in 
each iteration of the loop, we will 
compute the shortest path distance to one 
additional vertex. 
And by the end of the loop, we'll have 
computed shortest path distances to 
everybody. 
The proof of correctness which we'll do 
in the next video is a little bit subtle 
but also quite natural, quite pretty. 
And then, finally, Dijkstra's Algorithm 
will give us our first opportunity. 
See the inner play between good algorithm 
design and good data structure design. 
So with a suitable application of the 
heap data structure we'll be able to 
implement Dijkstra's algorithm so it runs 
blazingly fast, almost linear time, 
namely N times log N. 
But I'm getting a little ahead of myself. 
Let me actually show you this pseudo 
code. 
At a high level you really should think 
of Dijkstra's algorithm as being a close 
cousin as Breadthfirst search and indeed 
if all the edge links are equal to one 
Dijkstra's algorithm becomes 
Breadth-first search. 
So this is sort of a slip generalization 
Breadth-first search when edges can have 
different lengths. 
So like our generic graph search 
procedures we're going to start at the 
source vertex S and each iteration we're 
going to conquer one new vertex. 
And we'll do that once each iteration 
after N - 1 iterations will be done. 
And each iteration will correctly compute 
the shortest path distance to one new 
possible destination vertex V. 
So let me just start by initializing some 
notation so capital X is going to denote 
the vertices we've dealt with so far and 
by dealt with I mean we've correctly 
computed shortest path distance from the 
source vertex to every vertex in X. 
We're going to augment X by one new 
vertex in each generation of the main 
loop. 
Remember, that we're responsible for 
outputting in numbers one for each 
vertex, we're not just computing one 
thing, we're computing the shortest path 
distance from the source vertex S to 
every other vertex. 
So I'm going to frame the output in terms 
of this array capital A. 
So for each vertex we're going to have an 
entry in the array A and the goal is at 
the end of the. 
A will be populated with the correct 
shortest path distances. 
Now, to help you understand Dijkstra's 
algorithm, 
I'm going to do some additional 
bookkeeping which you would not do in a 
real implementation of Dijkstra's 
algorithm. 
Specifically, in addition to this array, 
capital A in which we compute shortest 
path distances from the source vertex to 
every other destination. 
There's going to be an array, capital B 
in which we'll keep track of the actual 
shortest path itself from the source 
vertex S to each destination V. 
So, the arrays A and B will be indexed in 
the same way. 
There'll be one entry for each possible 
destination vertex V. 
Capital A will store just a number for 
each destination's shortest path 
distance. 
The array B in each position will store 
an actual path, to the path, the shortest 
path from S to V. 
But again, you would not include this in 
an actual implementation. 
I just find, it's, in my experience, it's 
easier for students to understand this 
algorithm if we think of the paths being 
carried along as well. 
So now that I've told you the semantics 
of these two arrays, I hope it's no 
surprise how we initialize them for the 
source vertex itself, S. 
the shortest path distance from S to 
itself is zero. 
The empty path gets you from S to S with 
length zero. 
There's no negative edges by assumptions. 
There's no way you can get from S back to 
S with a non-positive length, so this is 
definitely the shortest path distance for 
S. 
By the same reasoning, the shortest path 
from S to S is just the empty path, the 
path with no edges in it. 
So now let's proceed to the main while 
loop. 
So then plan is we want to grow this set 
capital X like a mold until it covers the 
entire graph so with each iteration its 
going to grow and cover up one new vertex 
and that vertex will then be processed 
and at the time of processing we're 
responsible for computing the shortest 
path distance from S to this vertex and 
also figuring out what the actual 
shortest path from S to this vertex is. 
So in each iteration, we need to grow x 
by one node to ensure that we make 
progress. 
So the obvious question is, which node 
should we pick? 
Which one do we add to x next? 
So there's going to be two ideas here. 
The first one we've already seen in terms 
of all of these generic graph search 
procedures. 
Which is, we're going to look at the 
edges and the vertices that which on the 
frontier. 
So we're going to look at the vertices 
that are just one hope away from vertices 
we've already put into x. 
So that motivates it, in given ideration 
of the wild loop, so look at the stuff 
we've already processed, that's X. 
And the stuff we haven't already 
processed, that's V - X. 
S of course, starts in X and we never 
take any thing out of X. 
So S is still there. 
You know, in some generic iteration of 
the while loop, we might have some other 
vertices that are in X. 
And in the generic iteration of this 
while loop, there might be multiple 
vertices which are not in X. 
And now, as we've seen in our graph 
search procedures, there are general 
edges crossing this cut. So, there are 
edges which have one endpoint in each 
side, one endpoint in X and one endpoint 
outside of X. 
This is a directed graph so they can 
cross in two directions. 
They can cross from left to right or they 
can cross from right to left. 
So you might have some edges internal to 
X those are things we don't care about at 
this point. 
You might have edges which are internal 
to V - X we also don't care about those 
at least not quite yet. 
And then you have edges which can cross 
from X to V - X. 
As well as edges that can cross in the 
reverse direction from V - X back to X. 
And the ones we're going to be interested 
in, just like when we did graph search 
and directed graphs, are the edges 
crossing from left to right. 
The edges whose tail is amongst the 
vertices we've already seen. 
And whose head is some not yet explored 
vertex. 
So the first idea is that at each 
iteration of the while loop we scan or we 
examine all of the edges with tail and X 
and head outside of X one of those is 
going to lead us to the vertex that we 
pick next. 
So that's the first idea but now we need 
a second idea because this is again quite 
undetermined. 
There could be multiple such vertices 
which meet this criterion, so for example 
in the cartoon in the bottom left part of 
this slide you'll notice that there's one 
vertex here which is the head of an arc 
that crosses from left to right, and 
there's yet another vertex down here, in 
D minus X, which again is the head of an 
arc, which crosses from left to right. 
So there are two options, of which of 
those two does suck into are set X, and 
we might want some guidance about which 
one to pick next. 
So the key ideal in dijkstra, is to give 
each vertex a score corresponding to how 
close that vertex seems to the source 
vertex S and then to pick, among all 
candidat vertices, the one that has the 
minimum score. 
Let me be more precise. 
So among all crossing edges, with tail on 
the left side and head on the right side, 
we pick the edge. 
The minimizes the following criterion, 
the shortest path distance that we 
previously computed from S to the vertex 
V, plus the length of the edge that 
connects V to W. 
So this is quite an important expression 
so I will call this Dijkstra's greedy 
criterion. 
This is a very good idea to use this 
method to choose which vertex to add to 
the set X, as we'll see. 
I need to give a name to this edge, which 
minimizes this quantity over all crossing 
edges, so it's call B star, W star. 
So for example in the cartoon in the 
bottom left maybe of the two edges 
crossing from left to right maybe the top 
one is the one that has a smaller value 
of Dijkstra's greedy criterion so in that 
case this would be the vertex V star and 
at the other end of the edge would be the 
vertex W star. 
So this edge, V star, W star is going to 
do wonders for us. 
It will both guide us to the vertex that 
we should add to X next. 
That's going to be W star. 
It's going to tell us how we should 
compute the shortest path distance to W 
star. 
As well as what the actual shortest path 
from S to W star is. 
So specifically, in this iteration of the 
while loop. 
After we've chosen this edge, V star, W 
star. 
We add W star to X. 
Remember, by definition W Star was 
previously not in capital X, so we're 
making progress by adding it to X, that's 
one more vertex in X. 
Now, X is supposed to represent all of 
the nodes that we've already processed. 
So, an invariant of this algorithm is 
that we've computed shortest path 
distances for everybody in X as well as 
the actual shortest paths. 
So, now that we're putting W star in X, 
we're responsible for all of this 
information, shortest path information. 
So, what we're going to do is we're going 
to set the, our estimate. 
Of, W stars shortest path distance from 
S, to be equal to the value of this 
die-strict reading criteria for this 
edge. 
That is, whatever our previously computed 
shortest path distance from S to B star 
was, plus, the length of the direct edge 
from B star to W star. 
Now, a key point is to realize that this 
code does make sense. 
By which I mean, if you think about this 
quantity AV, this has been previously 
computed. 
And that's because an invariant in this 
algorithm is we've always computed 
shortest path distances to everything 
that's in capital X. 
And of course, the same thing holds when 
we need to assign W star shortest path 
distance. 
Because V star was a member of capital X. 
We had already computed its shortest path 
distance. 
So we can just look up the V star entry 
position in the array A. 
So over in our picture in our left, we 
would just say well what did we compute 
the shortest path distance to these star 
previously maybe it's something like 
seventeen, then we'd say what is the 
length of this direct edge from V star to 
W star, maybe that's six. 
Then we would add seventeen and six and 
we would put 23 as our estimate of the 
shortest path distance from S to W star. 
So we'll do something analogous with the 
shortest paths itself, in the array B, 
that is again, we're responsible since we 
just added W star to capital X where 
responsible for suggesting a path from S 
to W star in the B array. 
So what we're going to do, is we're just 
going to, inherit the previously computed 
path to V star and we're just going to 
tack on the end one extra hop, namely the 
direct edge from V-star to W-star, 
that'll give us a path from S all the way 
to W-star, via, V-star as an intermediate 
pit stop. 
And that is the entirety of Dijkstra's 
algorithm. 
I've explained all of the ingredients 
about how it works at a conceptual level. 
So the two things I owe you is, you know 
why is it correct? 
Why does it actually compute shortest 
paths correctly to all of the different 
vertices, and then secondly, how fast can 
we implement it? 
So the next two videos are going to 
answer both of those questions, but 
before we do that, let's go through an 
example, to get a better feel for how 
this algorithm actually works, and I also 
want to go through a non-example. 
So that you can appreciate how it breaks 
down when there are negative edges. 
And that'll make it clear why we do need 
a proof of correctness. 
Because it's not correct without any 
assumptions about the edge lengths.