In this video we'll discuss how we actually implement Dijkstra's shortest path algorithm. And in particular, using the heap data structure, we'll give a blazingly fast implementation, almost linear time. Let me just briefly remind you of the problem we're solving. It's the single source, shortest path problem. So we're given the directed graph and a source vertex, S. We're assuming that there's a path from S to every other vertex, V. If that's not true, we can detect it with an easy pre-processing step, so our task then is just to find the shortest path amongst all of them from the source vertex, S to each possible destination, V. Moreover, every edge of the graph has a non-negative edge length which we're denoting, else of V. So recall that Dijkstra's algorithm is driven by a single Y loop. So we're going to add one additional vertex to an evolving set capital X as the algorithm proceeds. So X is the vertices that have been processed so far. We maintain the invariant that for every processed vertex we've computed what we think the shortest path distance is to that vertex. So initially X is just the source vertex S. Of course, the shortest path distance from S to itself is zero. And then the cleverness of Dijkstra's algorithm is in how we figure out which vertex to add to the set capital X each iteration. So the first thing we do is we. Focus only on edges that cross the frontier edges that have their tail in capital X and their head outside of capital X. Now, of course, there may be many such edges, Edges that cross this frontier and we use Dijkstra's Grady criterion to select one of them. So for each crossing edge, each edge with a tail and X and head outside X, we compute the Dijkstra Grady score that is defined as the previously computed shortest path distance to the tail of the arc plus the length of the arc. So we compute that for each crossing edge and then the minimum edge we're calling it V star W star. That determines how we proceed. So we add the head of that arc W star to the set capital X and then w e compute the shortest path distance to W star to give the previous. See computed shortest fast distance to V Star plus the length of this extra [inaudible] V Star, W Star. Now back when I explained this algorithm I did it using two arrays, array capital A and array capital BA Is what computed the shortest path distances, and remember that's what the problem asks us to compute. And for clarity I also filled up this array capital B just to keep track of the shortest paths themselves. Now if you look at the code of this algorithm, we don't actually need the array capital B for anything. When we fill in the array capital A, we don't actually refer to the B array. And so now that we're gonna talk about real implementations of Dijkstra; I'm actually gonna cross out all of the instructions that correspond to the B array. Okay? Because you would not, as I told you earlier, use this in a real implementation of Dijkstra. You would just fill in the shortest path distances themselves. So in the next quiz, what I want you to think about is the running time of this algorithm if we implemented it more or less as is, according to the pseudo code on this slide without any special data structures. And in the answers to the quiz, we're going to be using the usual notation where M denotes the number of edges in the graph, and N denotes the number of vertices of the graph. So the correct answer to this quiz is the fourth one that the straightforward implementation of Dijkstra's algorithm would give you a running time proportional to the product of the number of edges and the number of vertices. And the way to see that is to just look at the main while loop and look at how many times it executes and then how much work we do per iteration of the while loop if we implemented it in a straightforward way. So there's gonna be N minus one iterations of the while loop. And the reason is, is that the algorithm terminates once every single vertex has been added to capital X. Their end vertices, initially there's one vertex in X. So after N m inus one iterations, we'll have sucked up all of the vertices. Now what's the work done in each wild loop? Well basically, we do naively a linear scan through all of the edges. We go through the edges. We check if it's an eligible edge, that is if its tail is in X and its head is outside of X. We can keep track of that just by having an auxiliary bullion variable for each vertex. Remembering whether it's an X or not. And then amongst all of the illegible edges, those crossing the frontier, we just, by exhaustive. Search remember which edge has the smallest Dijkstra store- score, now we can compute the Dijkstra score in constant time for each of the edges. So that's a reasonable algorithm. We might be able to get away with graphs that have say hundreds or thousands of vertices using the straight forward of implementation, but of course, we'd like to do better. We'd like the algorithm to scale up to much larger graphs, even graphs with potentially say a million vertices So the answer is yes, we can do better. Not by changing the algorithm, but, rather, changing how we organize the data as the algorithm proceeds. So this will be the first time in the course where we use a data structure to get an algorithmic speed-up. So we're gonna see a really lovely interplay between, on the one hand, algorithm design and, on the other hand, data structure design in this implementation of Dijkstra's algorithm. So you might well ask what's the clue that indicates that a data structure might be useful in speeding up Dijkstra's shortest path algorithm. And the way you'd figure this out is you'd say, well, where is all this work coming from? Why are we doing a linear amount of work in the edges for a linear number in the vertices iterations? Well, at each iteration of this while loop, what we're doing is, we're just doing an exhaustive search to compute a minimum. We look at every edge, we look at those that cross the frontier, and we compute the one with the minimum Dijkstra score. So we could ask ourselves, oh, if we're doing minimum comp utations over and over and over again, Is there some data structure which, whose raison d??tre, whose reason for being is in fact to perform fast minimum computations? And in fact there is such a data structure. It's the heap data structure. So in the following description of a fast implementation of Dijkstra's algorithm, I'm going to assume you're familiar with this heap data structure. For example, that you watched the review video elsewhere on the course site that explains it. So let me just remind you with a lightning-quick review of what we learned in that video, So heaps are generally logically thought of as a complete binary tree, even though they are usually implemented as a laid-out linear array. And the key property that you get to leverage but that you also have to maintain in a heap is the heap property that at every node the key at that node has to be at least as small as that of both of the children. This property ensures that the smallest key of them all has to be at the root of this tree. To implement extract menu, just pluck off the roots. That's what you return. That's the minimum element. And then you swap up the, bottommost rightmost leaf. The last element, Make that the new root, And then you bubble that down as necessary to restore the heap property. When you do insertion, you just make the new element the new last leaf, bottommost rightmost leaf, and then you swap up as needed to restore the heap property. When we use heaps in Dijkstra's algorithm we're also going to need the ability to delete an element from the middle of the heap. But again you can do that just by swapping things and doubling up or down as needed. I'll leave it as an exercise for you to think through carefully, how to delete elements from the middle of a heap. Because you're maintaining the heap as an essentially perfectly balanced binary tree, the height of the tree is roughly the log base two of N, where N is the number of elements in the heap. And because for every operation, you implement it just by doing a constant amo unt work at each level of the tree, all of these operations run in O of log N time, where N is the number of items that are being stored in the heap. As far as the intuitive connection between the heap data structure and Dijkstra's Algorithm. In the main wild loop of Dijkstra's Algorithm, we're responsible for finding a minimum, every single iteration. What are heaps good for? They're good for finding minimums in logarithmic time. That sounds a lot better than the linear time we're spending in the naive implementation of Dijkstra's Algorithm. So let's now see how to use heaps to speed up Dijkstra's shortest path algorithm. Now because every iteration of the wild loop is responsible for picking an edge, you might expect that we're going to store edges in the heap. So the first subtle but really good idea is to actually use a heap to store vertices rather than edges. Going back to the pseudo-code [inaudible] algorithm, remember that the only reason that we focused on an edge. Well so that we can then deduce which vertex, namely the head of that edge, to add to our set capital X. So we're just going to cut to the chase, we're just going to keep vertices not yet in X and then when we extract them in from the heap, it'll tell us which is the next vertex to add into the set capital X. So the picture we're going to wanna have in mind is Dijkstra's choice path algorithm at some intermediate iteration. So there'll be a bunch of vertices in the, the set capital X source vertex plus a bunch of other stuff that we've sucked into the set so far. And then there'll be all the vertices we haven't processed yet. A big group V minus X. Then there's gonna be edges crossing this cut in both directions from X to V minus X and vice versa. Now before I explain the second invariant, let's just recall what the straightforward implementation of Dijkstra's Algorithm needs to do. What it would do is search through all the edges and it would look for any eligible edges. Those with tail and X, and head and V minus X. So in this picture, there w ould be three such edges. I've drawn the example so that two of the edges, the top two edges, both share a common head vertex whereas the third edge has its own head vertex. The straightforward of limitation of Dysktra's Algorithm we?d compute Dystra's greedy score for each of these three edges And remember, by definition, that's the previously computed shortest path distance to the tail of the arc V, plus the length of the arc VW. So the straightforward implementation just computes this. In this case, it would compute it for three edges. And whichever the three edges won had the smallest score. The head of that edge would be the next vertex that gets added to X. So let me specify the second invariant, and then I'll tell you how to think about it. So, because we're storing vertices rather than edges in the heap, we're going to have to be fairly clever with the way we define the key of a vertex that's in this heap. [sound] So we're going to maintain the property that the key of a vertex V is the smallest greedy Dijkstra score of any ver, any edge which has that vertex as its head. So let me show you what I mean in terms of our example, where we have three crossing edges. Suppose for these three edges in the upper right that happen to have of Dijkstra [inaudible] scores of seven, three, and five. Let's look at what the key should be for each of these three vertices I've drawn in V minus X. Now for the timeline vertex this is pretty interesting. There are two different edges whose tail is in X, and have this vertex as their head. So what should the key of this vertex be? Well, it should be the smallest Dijkstra greedy score of any of the edges whose tail lies on the left-hand side that terminate at this vertex. So there's two candidate edges. One has Dijkstra greedy score three. One has Dijkstra greedy score seven. So the key value should be three, the smaller of those two. Now, the second vertex, there's only a single edge that has tail in X and that terminates at this vertex. So the key for this vertex should jus t be the score at that weak edge. So in this case that's gonna be five. And then this poor third vertex, there's actually no edges at all, that, started X and terminated at this vertex. There's only one arc going the wrong direction. So for any edge, sorry, for any vertex outside of X that doesn't have any eligible edges terminating at it, we think of the key as being plus infinity. So the way I recommend thinking about these heap keys is that we've taken what used to be one round tournament, winner takes all And we've turned it into a two round knockout tournament. So in our straightforward implementation of Dijkstra's algorithm, we did a single linear search through all the edges, and we just computed the [inaudible] Dijkstra's score for each and we picked the best. So in this example we would have discovered these three edges in some order. Their scores are three, five, and seven. And we would have remembered the edge with score three as being the best. That would have been our winner of this winner take all tournament. Now when we use the heap, it, we're factoring it into two rounds. So first, each vertex in V minus X runs a local tournament. To elect a local winner, so each of these vertices in V minus X. Says, well let me look at all of the edges. For whom I'm the head and also the tail of that edge is in X. And amongst all of those edges that start in X and terminate at me, I'm going to remember the best of those. So that's the winners of the local tournament of the first round. And now the heap is only going to remember this set of first round winners. Right, there's no point in remembering the existence of edges who aren't even the smallest score that terminate at a given vertex, because we only care about the smallest score overall. Now when you extract min from the heap, that's in effect. Executing the second and final round of this knockout tournament. So each of the vertices of V minus X has proposed their local winner. And then the heat in an extract min just chooses the best of all of those local winners. So that's the final proposed vertex that comes out of the heap. So the point is that if we can successfully maintain these two invariants, then, when we extract min from this heap, we'll get exactly the correct vertex, W star, that we're supposed to add to the set capital X next. That is, the heap will just hand to us on a silver platter exactly the same choice of vertex that our previous exhaustive search through the edges would've computed. The exhaustive search was just computing the minimum in a brute force way, in a single winner take all tournament. The heap implemented in this way chooses exactly the same winner. It just does it in this 2-round process. Now, in Dijkstra's algorithm, we weren't supposed to merely just find the vertex W star to add to X. We also had to compute its shortest path distance. But remember, we computed the shortest path distance as simply the Dijkstra greedy score. And here the Dijkstra greedy score is just going to be the key for this heap that's immediate from invariant number two. So we're using the fact here that our keys are, by definition, just. The smallest greedy scores are edges that stick into that vertex W STAR so again exactly replicating. The computation that we would have done in the straightforward implementation, just in a much slicker way. Okay? But we're adding exactly the same vortices, in exactly the same order, and we're computing exactly the same shortest path distances in this heap of notation, provided of course that we do successfully maintain these two invariants throughout the course of the algorithm. So that is now what I owe you. We have to pay the piper. We've shown that if we can have a data structure with these properties. Then we can simulate the straight forward implementation now I have to show you how we maintain these invariants without doing too much work. All right. So maintaining invariant number one will really take care of itself. Really sort of by definition the vertices which remain in the heap are those that we haven't process ed yet, and those are the ones that are outside of capital X. So really the trick is, how do we maintain invariant number two? Now before I explain this let me point out, that this is a tricky problem. There is something subtle going on. So as usual, I want you to think about this shortest path algorithm at some intermediate iteration. Okay? So take a, take a snapshot. A bunch of vortices have already been added to X. A bunch of vortices are still hanging out in the heap. They haven't been added to X. There's some frontier, there's a, just crossing, possibly in both directions. And suppose at the end of a current iteration we identify the vortex W, which we're going to extract from the heap and conceptually add to the set X. Now the reason things complicated is when we move a vortex from outside X to inside X. The frontier between X and V minus X changes. So in this picture, the old black X becomes this new blue X. And what's really interesting about the frontier changing is that then the edges which cross the frontier change. Now, there might be, there are some edges which used to cross the frontier and now don't. Those are the ones that are coming into W. Those we're not so concerned with. Those don't really play any role. What makes things tricky is that there are edges which used to not be crossing the frontier but now they are crossing the frontier. And those are precisely the edges sticking out of W. So in this picture there are three such edges which I will highlight here in pink. To see why it's tricky when new edges all the sudden are crossing a frontier let's remember what invariant number two says. It says that for every vertex which is still in the heap, which is not yet in X, the key for that vertex better be The smallest Dijkstra Grady score of any edge which comes from capital X and sticks into this vertex of V. Now in moving one vertex into X, namely this vertex W, now there can be new edges sticking into vertices which were still on the heap. As a result, the appropriate key value for vertices i n the heap might be smaller. Now the W has been moved into X. And the candidates for the vertices in the heap whose keys might have dropped are precisely those vertices on the other end of edges sticking out of W. So summarizing, the fact that we'd added a new vertex to capital X and extracting something from the heap, it's potentially increased the number of crossing edges across the frontier, because the frontier has changed. And therefore, for vertices that remain in the heap, the smallest greedy score of an edge that sticks into them from the set X might have dropped. So we need to update those keys to maintain invariant number two. Now, that's the hard part. Here's what we have going for us. We've damaged the keys perhaps by changing the frontier, but the damage is local. We can understand exactly whose keys might have dropped, so as suggested by the picture, the vertices whose keys we need to update are precisely those at the head of edges that stick out of W. So for each outgoing edge from W, the vertex we just extracted from the heap, we need to go to the other end of the edge and check if that vertex needs its key to be decreased. So here's the pseudo code to do this. So when we extract the vertex W from the heap, that is when we conceptually add a new vertex W. To the set X, thereby changing the frontier, we say, well, you know, we know the only vertices that might have to have their key changed, they're the ones on the other side of these outgoing arcs from W. So we just have a simple iteration over the outgoing edges, W V, from the vertex V. Now I haven't shown you any edges in the picture like this, but there might well be some edges where the head of the arc V is also in the set X, is also already been processes. But anything in X is not in the heap. Remember, the heap is only the stuff outside of X. So we could care less about the stuff outside. Of the heat, for not maintaining their keys. So we do an extra check. If the head of this edge is in fact still in the heap, that is if it's not in X So i n the picture, for example, this would be true for all three of the vertices that are on the other end of arcs pointing out of W. And for each of these vertices V, we update its key. And the way we're going to update its key is, we're just going to rip this vertex out of the heap. We're going to recompute its key and constant time, and then we're going to reinsert it into the heap. And since all heap operations take logarithmic time, this key update will be logarithmic time. As an additional optimization, I wanna point out that if one of these vertices V's key does change, it can only change in one way. So remember, what is the key? The key is the smallest Grady Dijkstra score of all of the edges that start next and stick into this vertex. So that's the local tournament or the first round tournament happening at this vertex V. Now the only thing which has changed. Before and after we added this vertex, W to X, is that now one new edge is sticking into this vertex, V. All of the old edges sticking into it from X are still sticking into it, and now there's one extra candidate in its local tournament, namely this edge, WV. So either WV is the local winner; either it has the smallest Dyxtra-Greedy score of them all. That terminated this vertex, or it doesn't, in which case the previous winner is still the new winner. So if that is, the new key value can only be one of two things. Either it's the old key value--that's the case where this. Extra entrance, the edge from W to V is irrelevant. Or, if it's changed, it has to have changed to the [inaudible] score of this edge, W-V. And the formula for that is the shortest path distance. That we just computed for W where W has been processed at this point plus the link of the direct arch from W M V. And again conceptually this formula is just a greedy Dijkstra score for the arc WV. The new entrance in V's local first round tournament. So now, having updated V's key appropriately, so that invariant #two is restored. And once again, the key of every vertex does reflect the sma llest greedy, Dijkstra greedy score of any edge sticking into it from the set X. We can safely reinsert this node back into the heap with its new key value. And these three lines together are just a key update in logarithmic time, for one of these vertices that's at the other end of an arc sticking out of the vertex W. So let's tally up the running time in this new implementation. One thing I want you to check, and this will definitely help you understand this refined implementation of Dijkstra's algorithm, is that essentially all the work done is through the heap API. That is, all of the running time that we have to account for is in heap operations. We don't really do nontrivial work outside of heap operations. And again recall that the running time of any heap operation is logarithmic in the number of elements in the heap. Our heap is storing vertices. It's never gonna have more than N things in it. So the running time of every heap operation is big O of log N. So what are the heap operations that we do. Well, we extract men and we do it once per iteration of the wild loop. So there's N minus one iterations of the wild loop, just like before, but now instead of doing an exhaustive search through the edges, we just do a simple extract men from the heap and it gives us on a silver platter the vortex we should add next. So what do we do beside extract mins? Well, we have to do this work paying the piper. We have to maintain invariant #two. And every time we extract a min, that then triggers some subsequent key updates. And remember, each of these key updates is a deletion of an element, from the heap followed by an insertion. So how many deletions and insertions do we do? Well, at first this might seem a little bit scary. Right? Because we do a roughly linear number of extract mins. And a vertex might have as many as N-1 outgoing arcs. So it seems like a vertex could trigger as many as N-1 key updates, which is theta of N [inaudible] operations. And if we sum that up over the N iterations of the wild loop that w ould give us N squared heap operations. So, and indeed, in dense graphs, that can be the case. It is true that a single vertex might trigger a linear in N [inaudible] number of [inaudible] operations. But that's the wrong way to think about it. Rather than have this vertex-centric perspective on what, who's responsible for heap operations, let's have an edge-centric view. So for each edge at the graph, let's think about when can this be responsible for some heap operations, in particular a decrease in key in the resulting insertion and deletion. If you have an edge and it points from the vertex V to the vertex W. There's actually only one situation in which this edge is going to be responsible for a, a decrease in key. And that's in the case where the tail of the edge, V. Gets sucked into the set X before the head W of this edge gets sucked into the set X. If that happens, if V gets sucked into X and W is still outside of X, then indeed we're gonna have to decrease the key of W, just like we did in the examples. But that's all that's gonna happen: V can only get sucked into X once and never gonna leave it. So it's only responsible for this single decrease in key of its head W. And that's one insertion and one deletion. And in fact, if the endpoints of this edge get sucked into X in the opposite order, if the tail of, excuse me, if the head of this edge W gets sucked into X first. That doesn't even trigger a, a key decrease for V, and V will never have its de key decreased, because of this particular arc, from V to W. So the upshot is that each edge VW of the graph triggers at most one insert delete combo. So what does this mean, this means that the number of heap operations. Is big O of N, that's for the extract mins. Plus big O of M. That's for the insert the leak combos triggered by edges during the decreased keys. Now just to, I'm gonna write this in a, in a simplified way. This is just O of M, the number of edges. And this is because of our assumption that's there's a path to s from every other vertex. If yo u think about it that means that the graph is at least weakly connected if you picked it up it would stay together in one piece. So that means it at least contains a tree, at least an in an undirected sense, which means it contains at least N minus one edges. So we're in the case of weakly connected graphs where N dominates M. M is always as big as N at least up to a plus one. So what that means is the running time of Dijkstra's algorithm, with this heap implementation, is just a log factor larger. Remember, every heap operation takes time logarithmic. So we do a linear in M number of operations; each takes time logarithmic in N. So the running time is M log N. With, I should say, quite good consistence. So this is a really, really impressively fast algorithm, for computer such a useful problem as shortest paths. So we got a little bit spoiled in our discussion of graph searching connectivity, where it seemed any problem we cared about we could solve in linear time, over M plus N. So here we're picking up this extra logarithmic factor, but I mean, come on, this is still awesome. A running time of M log N is unbelievably faster than a running time of M times N, which is what we had in the straightforward implementation. So this deft use of the heap data structure has given us a truly blazingly fast algorithm for an extremely well motivated problem, computing shortest paths.