In this video, and then in the next. We're going to develop, from scratch, a algorithm for the all pair shortest path problem. Rather than by relying on the aforementioned reduction to the single source version of the problem. So, in this video, we're going to develop the relevant optimal substructure lemma. In the next video, we'll culminate in the dynamic programming algorithm. It is a famous one. The Floyd Warshall algorithm. Before developing the optimal substructure let me just tell you a little bit about where we're going. The optimal sub-structure we're about to identify will lead to a dynamic programming solution via the usual recipe, which we've now seen over, and over, and over again. It's going to solve the all-pairs shortest paths problem in O of N cubed time. So we get a bound independent of the sparsity. [SOUND] N cubed, even if the graph has negative edge lengths. This algorithm is know as the Floyd-Warshall algorithm. Last video we discussed the performance that you get by running a single source, shortest path sub routine N times. And the Floyd-Warshall algorithm is a better solution in many situations. So first of all, for graphs that have general edge links. That is edge links that are allowed to be negative then Floyd-Warshall is looking pretty good. Remember with the previous solution was to run the Bellman-Ford algorithm end times. We can't use Dijkstra's algorithm cause that's not generally correct with negative edge costs. And if you run the Bellman-Ford algorithm in times you get a running time of N squared M. So even in the best case of Bellman-Ford's sparse graphs, we're doing just as well with Ford Warshall. We've got a cubic running time. And we're doing quite a bit better for dense graphs. There N times Bellman-Ford will give us N to the fourth. A cubic running time whereas here we're getting cubic. Now for graphs with non negative edge costs, reducing to the single source problem is actually a pretty good solution because Dijkstra is so fast. So if you're on Dijkstra N times, you're running time is going to be N times M times log N. So for sparse graphs you'd really want to use Dijkstra N times, rather than ford warshaw. Cause you get a running time roughly quadratic of N, as opposed to the cubic running time we're going to get here. For dense graphs, it's not so clear. If you run dyxtra N times you're going to get a running time which is ball park cubic, so that's going to be roughly the same as Floyd-Warshall. And you'd expect them to be comparable. It's not clear which would be better in practice. If you. Had to choose between those two solutions for dense graphs, you'd really want to code them both up and pick whichever one seemed to do better in your domain. One place you see the Floyd-Warshall algorithm used quite a bit is for computing the transitive closure of a binary relation. In a graph theoretic language you can think of a transitive closure problem as computing all pairs reachability. So that's a special case of all pair shortest paths where you just want to know whether the shortest path distance is finite or infinite. For each pair of vertices you want to know does there exist a path from the one vertex to the other or not. And if all you care about is the transitive closure problem, if all you care about is one bit of information for each pair of nodes, connected or not, as opposed to more generally what the shortest path length is, then you can do some optimizations on the Floyd-Warshal algorithm to speed up the constant factors. But I'll leave it up to you to read up on the web about that if you're interested. Now, I realize that this cubic running time is, [SOUND] maybe, not super impressive. In general, as we've been talking about dynamic programming algorithms, we've been starting to see, for the first time in these courses, a proliferation of algorithms. Which, you know, while polynomial time, while way better than exponential time, you wouldn't really call blazingly fast. We've seen a lot of quadratic running times. Now we're seeing some cubic running times. So, that's kind of a bummer, but, I am still teaching you the state-of-the-art. As we mentioned this in the last video, it remains an open question to this day, whether there are significantly. [SOUND] Of cubic algorithms that solve the all pairs shortest paths problem. So now, let's formalize the optimal substructure in all pair shortest path which is exploited by the Floyd-Warshall algorithm. A quick digression first though. At the beginning of the class I hope that you know, if I had shown you this Floyd-Warshall algorithm, you would have said wow, that's a really elegant algorithm. How could IU have ever come up with this? Whereas now that you're approaching the black belt level in dynamic programming and you see just how mechanically the algorithm, this famous algorithm is going to fall out of the really quite easy optimal substructure. I hope you say, how could I not have come up with this algorithm? Now I don't mean to take any credit away from the solutions of Floyd and Warshall. There is one really great idea in how they phrase this optimal substructure. Remember we talked about this before the Bellman-Ford algorithm, it can be tricky to apply dynamic programming to graph problems because there isn't really an ordering in the input. You just get this bag of vertices and this bag of edges. And it's often unclear how to define bigger and smaller subproblems. So the really nice solution in the Bellman-Ford algorithm was to introduce this extra parameter I. I was a budget on the number of edges or roughly equivalently the number of vertices that you are allowed to use in a path. Between the origin and destination for a given subproblem. This naturally induces an ordering on subproblems, the bigger the edge budget, the bigger the subproblem. We're going to do something similar but even a little bit more stringent in the Floyd-Warshall solution. We're again going to restrict the number of vertices that can appear in the middle of the path between a given origin and destination and a subproblem. But more than just the number of the vertices we're allowed to use, we're going to restrict exactly which vertices, the identities of the vertices that the algorithms permitted to use to get from the given origin to the given destination. in many ways, the restriction we're going to impose is totally analogist to, say, the knapsack problem where there wasn't an intrinsic ordering on the items but we imposed one anyways and then we just look at prefixes of the items. And bigger sub-problems are ones with bigger prefixes of items. Here we're going to impose an arbitrary ordering on the vertices. And in a given sub-problem, you're only allowed to use some prefix of the vertices as internal nodes in a path. And then, of course, the bigger the prefix you're permitted to use, the bigger the sub-problem. So to be a little bit more formal about it, let's just impose some arbitrary ordering on the vertices capital V, name the vertices one, two, three, all the way up to N. I don't care how. And I'm going to use the notation capital V superscript K to denote the prefix of the first K vertices, vertices one through K. So let me start now setting up the optimal substructure limit. In contrast to the Bellman-Ford case, I'm only going to prove the optimal substructure limit of four input graphs that have no negative cycle. We'll address the case of negative cycles after we're done with the algorithm. Suppose for now there isn't one. So what then are the subproblems going to be? Well it's going to be quite analogous to Bellman-Ford. In Bellman-Ford we were solving a single source shortest path problem so we needed to compute something for each destination. So that gave us a linear number of sub-problems and then for a given destination we had this parameter I controlling the edge budget. So that was another linear factor, so we had a quadratic number of subproblems. Here it's going to be the same except we also have to range over our own origin. So we're going to get a cubic number of subproblems, specifically a choice for the origin, N choices, a choice for the destination, that's N more choices plus. Which prefix of vertices we're allowed to use. So that's still another N choices. So cubic total. So now we focus on an optimal solution to this sub problem. By which I mean amongst all paths which begin at the vertex I, end at the vertex J, and strictly in between I and J, as internal nodes contain only vertices from one through K. Amongst all those paths look at the one with the shortest length. And because we're looking at the case of no negative cycles, we can assume that this path has no cycles. It's a cycle free path. So rather than state the conclusion of the optimal substructure lemma right now, let me just make sure you understand what one of these sub-problems is. So, lets suppose that the origin is the vertex that we've, aren't labelled arbitrarily seventeen, the destination J is the vertex we've labelled ten, and let's suppose K at the moment is only five. Consider the following graph. Imagine this is a little piece of some much bigger graph. Now I hope it's clear what the shortest path is between seventeen and ten. It's the bottom two hop path, that path has total length minus twenty. On the other hand, for the sub problem we're dealing with, K equals five. So what that means is we have this extra constraint on which vertices can we can use in the middle of our paths. Now to be clear, this constraint K at most five that can't apply to origin of the destination. Right? I is 17, J is 10, both of those are bigger than five, but we don't care. Obviously, any path from I to J, has to include both the vertices I and J. So the constraint only applies to vertices in the middle of the path. But, this bottom two hot path, unfortunately makes use of the node seven. So that path does not obey the constraint, it is not at most five, so that path is disallowed. Can not use it. Therefore, the shortest path from seventeen to ten, that makes use of only the first five The five labeled vertices as intermediate ones would be the top three hot path, which has the bigger length of three. Oay, so I hope it's clear what a c- sub-problem corresponds to. It corresponds to a choice of I, the origin. That's again just some vertex labeled something from one to N. Similarly there's a choice of the destination, some other vertex J which is labeled something from one to N. And then there's a choice of the bound K. Which governs which vertices you're permitted to use internal to your path. So again the constraint doesn't apply to the end points, just the vertices in between the end points. And in the subproblem the choice of K says you can only use vertices one through K, internal to your paths. So hopefully that makes sense, let's move on to the full statement of the optimal substructure lema. All right, so this is actually going to be one of the simpler optimal substructural lemma that we've seen in a while. We're back to the glory days of only two cases. The first is trivial, if the shortest path from I to J using only vertices one through K as intermediaries doesn't even bother to use the vertex K, well then it better well be a shortest path from I to J that uses internal nodes only from one to K-1. Now it's in case two it's going to be apparent what a great idea ordering the vertices looking at prefixes. The vertices is when you're considering the all pairs version of the shortest path problem. Suppose this path P from I to J does indeed use the vertex K in the middle. Well than we can think of the path P as comprising two sub paths. The first path P1 which starts at I and goes to the vertex K. And then a path P2 which starts at K and goes to the destination J. Now here's what's cool. So on this path P the internal nodes between I and J all lie in one through K. Moreover, this path P remembers cycle free so the vertex K appears exactly once, not more than once. So therefore if we split the path P into these two parts, P1 and P2, internal to P1, strictly in between I and K. There's only vertices between one and K-1. Similarly strictly between K and J and P2, there's only vertices between one through K-1. Thus both of these paths, P1 and P2 can be thought of as feasible solutions to smaller subproblems. Subproblems with an even tighter budget of K minus 1 on the internal nodes that are allowed and these aren't just feasible solutions for smaller subproblems, they're optimal solutions as well. So, how slick is that? That the maximum indexed, internal node just splits this shortest path into two shortest paths for smaller sub-problems? And, you know what? At this point, I am so confident in your facility with dynamic programming, I am not going to even bother to prove this statement. You are totally capable of proving this yourself. The argument, it's really quite similar to Bellman-Ford. I encourage you to do it as an exercise. [SOUND]