1 00:00:00,000 --> 00:00:03,479 In this video, and then in the next. We're going to develop, from scratch, a 2 00:00:03,479 --> 00:00:05,915 algorithm for the all pair shortest path problem. 3 00:00:05,915 --> 00:00:09,345 Rather than by relying on the aforementioned reduction to the single 4 00:00:09,345 --> 00:00:12,676 source version of the problem. So, in this video, we're going to develop 5 00:00:12,676 --> 00:00:16,702 the relevant optimal substructure lemma. In the next video, we'll culminate in the 6 00:00:16,702 --> 00:00:19,138 dynamic programming algorithm. It is a famous one. 7 00:00:19,138 --> 00:00:23,001 The Floyd Warshall algorithm. Before developing the optimal 8 00:00:23,001 --> 00:00:27,200 substructure let me just tell you a little bit about where we're going. 9 00:00:27,200 --> 00:00:31,135 The optimal sub-structure we're about to identify will lead to a dynamic 10 00:00:31,135 --> 00:00:35,017 programming solution via the usual recipe, which we've now seen over, and 11 00:00:35,017 --> 00:00:37,767 over, and over again. It's going to solve the all-pairs 12 00:00:37,767 --> 00:00:40,139 shortest paths problem in O of N cubed time. 13 00:00:40,139 --> 00:00:42,619 So we get a bound independent of the sparsity. 14 00:00:42,619 --> 00:00:45,854 [SOUND] N cubed, even if the graph has negative edge lengths. 15 00:00:45,854 --> 00:00:48,820 This algorithm is know as the Floyd-Warshall algorithm. 16 00:00:50,040 --> 00:00:54,805 Last video we discussed the performance that you get by running a single source, 17 00:00:54,805 --> 00:00:58,915 shortest path sub routine N times. And the Floyd-Warshall algorithm is a 18 00:00:58,915 --> 00:01:03,242 better solution in many situations. So first of all, for graphs that have 19 00:01:03,242 --> 00:01:06,116 general edge links. That is edge links that are allowed to be 20 00:01:06,116 --> 00:01:08,424 negative then Floyd-Warshall is looking pretty good. 21 00:01:08,424 --> 00:01:12,099 Remember with the previous solution was to run the Bellman-Ford algorithm end 22 00:01:12,099 --> 00:01:14,265 times. We can't use Dijkstra's algorithm cause 23 00:01:14,265 --> 00:01:16,809 that's not generally correct with negative edge costs. 24 00:01:16,809 --> 00:01:20,531 And if you run the Bellman-Ford algorithm in times you get a running time of N 25 00:01:20,531 --> 00:01:22,321 squared M. So even in the best case of 26 00:01:22,321 --> 00:01:25,665 Bellman-Ford's sparse graphs, we're doing just as well with Ford Warshall. 27 00:01:25,665 --> 00:01:28,963 We've got a cubic running time. And we're doing quite a bit better for 28 00:01:28,963 --> 00:01:31,554 dense graphs. There N times Bellman-Ford will give us N 29 00:01:31,554 --> 00:01:34,380 to the fourth. A cubic running time whereas here we're 30 00:01:34,380 --> 00:01:37,808 getting cubic. Now for graphs with non negative edge 31 00:01:37,808 --> 00:01:41,213 costs, reducing to the single source problem is actually a pretty good 32 00:01:41,213 --> 00:01:44,764 solution because Dijkstra is so fast. So if you're on Dijkstra N times, you're 33 00:01:44,764 --> 00:01:47,196 running time is going to be N times M times log N. 34 00:01:47,196 --> 00:01:50,892 So for sparse graphs you'd really want to use Dijkstra N times, rather than ford 35 00:01:50,892 --> 00:01:53,081 warshaw. Cause you get a running time roughly 36 00:01:53,081 --> 00:01:56,778 quadratic of N, as opposed to the cubic running time we're going to get here. 37 00:01:56,778 --> 00:02:00,523 For dense graphs, it's not so clear. If you run dyxtra N times you're going to 38 00:02:00,523 --> 00:02:04,317 get a running time which is ball park cubic, so that's going to be roughly the 39 00:02:04,317 --> 00:02:07,236 same as Floyd-Warshall. And you'd expect them to be comparable. 40 00:02:07,236 --> 00:02:09,619 It's not clear which would be better in practice. 41 00:02:09,619 --> 00:02:11,898 If you. Had to choose between those two solutions 42 00:02:11,898 --> 00:02:15,544 for dense graphs, you'd really want to code them both up and pick whichever one 43 00:02:15,544 --> 00:02:20,977 seemed to do better in your domain. One place you see the Floyd-Warshall 44 00:02:20,977 --> 00:02:25,021 algorithm used quite a bit is for computing the transitive closure of a 45 00:02:25,021 --> 00:02:28,404 binary relation. In a graph theoretic language you can 46 00:02:28,404 --> 00:02:31,957 think of a transitive closure problem as computing all pairs reachability. 47 00:02:31,957 --> 00:02:35,462 So that's a special case of all pair shortest paths where you just want to 48 00:02:35,462 --> 00:02:38,438 know whether the shortest path distance is finite or infinite. 49 00:02:38,438 --> 00:02:42,183 For each pair of vertices you want to know does there exist a path from the one 50 00:02:42,183 --> 00:02:45,804 vertex to the other or not. And if all you care about is the 51 00:02:45,804 --> 00:02:49,357 transitive closure problem, if all you care about is one bit of information for 52 00:02:49,357 --> 00:02:52,685 each pair of nodes, connected or not, as opposed to more generally what the 53 00:02:52,685 --> 00:02:56,238 shortest path length is, then you can do some optimizations on the Floyd-Warshal 54 00:02:56,238 --> 00:02:58,172 algorithm to speed up the constant factors. 55 00:02:58,172 --> 00:03:01,860 But I'll leave it up to you to read up on the web about that if you're interested. 56 00:03:02,960 --> 00:03:06,332 Now, I realize that this cubic running time is, [SOUND] maybe, not super 57 00:03:06,332 --> 00:03:08,740 impressive. In general, as we've been talking about 58 00:03:08,740 --> 00:03:12,450 dynamic programming algorithms, we've been starting to see, for the first time 59 00:03:12,450 --> 00:03:14,714 in these courses, a proliferation of algorithms. 60 00:03:14,714 --> 00:03:18,375 Which, you know, while polynomial time, while way better than exponential time, 61 00:03:18,375 --> 00:03:22,133 you wouldn't really call blazingly fast. We've seen a lot of quadratic running 62 00:03:22,133 --> 00:03:24,156 times. Now we're seeing some cubic running 63 00:03:24,156 --> 00:03:26,180 times. So, that's kind of a bummer, but, I am 64 00:03:26,180 --> 00:03:29,986 still teaching you the state-of-the-art. As we mentioned this in the last video, 65 00:03:29,986 --> 00:03:33,454 it remains an open question to this day, whether there are significantly. 66 00:03:33,454 --> 00:03:38,620 [SOUND] Of cubic algorithms that solve the all pairs shortest paths problem. 67 00:03:38,620 --> 00:03:42,488 So now, let's formalize the optimal substructure in all pair shortest path 68 00:03:42,488 --> 00:03:45,207 which is exploited by the Floyd-Warshall algorithm. 69 00:03:45,207 --> 00:03:49,076 A quick digression first though. At the beginning of the class I hope that 70 00:03:49,076 --> 00:03:52,893 you know, if I had shown you this Floyd-Warshall algorithm, you would have 71 00:03:52,893 --> 00:03:55,141 said wow, that's a really elegant algorithm. 72 00:03:55,141 --> 00:03:59,376 How could IU have ever come up with this? Whereas now that you're approaching the 73 00:03:59,376 --> 00:04:03,454 black belt level in dynamic programming and you see just how mechanically the 74 00:04:03,454 --> 00:04:07,114 algorithm, this famous algorithm is going to fall out of the really quite 75 00:04:07,114 --> 00:04:10,617 easy optimal substructure. I hope you say, how could I not have come 76 00:04:10,617 --> 00:04:14,834 up with this algorithm? Now I don't mean to take any credit away 77 00:04:14,834 --> 00:04:19,264 from the solutions of Floyd and Warshall. There is one really great idea in how 78 00:04:19,264 --> 00:04:23,532 they phrase this optimal substructure. Remember we talked about this before the 79 00:04:23,532 --> 00:04:27,746 Bellman-Ford algorithm, it can be tricky to apply dynamic programming to graph 80 00:04:27,746 --> 00:04:31,041 problems because there isn't really an ordering in the input. 81 00:04:31,041 --> 00:04:34,066 You just get this bag of vertices and this bag of edges. 82 00:04:34,066 --> 00:04:37,740 And it's often unclear how to define bigger and smaller subproblems. 83 00:04:39,040 --> 00:04:43,095 So the really nice solution in the Bellman-Ford algorithm was to introduce 84 00:04:43,095 --> 00:04:46,556 this extra parameter I. I was a budget on the number of edges or 85 00:04:46,556 --> 00:04:50,719 roughly equivalently the number of vertices that you are allowed to use in a 86 00:04:50,719 --> 00:04:53,132 path. Between the origin and destination for a 87 00:04:53,132 --> 00:04:56,005 given subproblem. This naturally induces an ordering on 88 00:04:56,005 --> 00:04:59,401 subproblems, the bigger the edge budget, the bigger the subproblem. 89 00:04:59,401 --> 00:05:03,424 We're going to do something similar but even a little bit more stringent in the 90 00:05:03,424 --> 00:05:06,820 Floyd-Warshall solution. We're again going to restrict the number 91 00:05:06,820 --> 00:05:11,051 of vertices that can appear in the middle of the path between a given origin and 92 00:05:11,051 --> 00:05:14,500 destination and a subproblem. But more than just the number of the 93 00:05:14,500 --> 00:05:18,366 vertices we're allowed to use, we're going to restrict exactly which vertices, 94 00:05:18,366 --> 00:05:22,545 the identities of the vertices that the algorithms permitted to use to get from 95 00:05:22,545 --> 00:05:24,740 the given origin to the given destination. 96 00:05:26,160 --> 00:05:30,317 in many ways, the restriction we're going to impose is totally analogist to, say, 97 00:05:30,317 --> 00:05:34,474 the knapsack problem where there wasn't an intrinsic ordering on the items but we 98 00:05:34,474 --> 00:05:37,870 imposed one anyways and then we just look at prefixes of the items. 99 00:05:37,870 --> 00:05:41,064 And bigger sub-problems are ones with bigger prefixes of items. 100 00:05:41,064 --> 00:05:44,359 Here we're going to impose an arbitrary ordering on the vertices. 101 00:05:44,359 --> 00:05:48,060 And in a given sub-problem, you're only allowed to use some prefix of the 102 00:05:48,060 --> 00:05:51,659 vertices as internal nodes in a path. And then, of course, the bigger the 103 00:05:51,659 --> 00:05:54,600 prefix you're permitted to use, the bigger the sub-problem. 104 00:05:55,900 --> 00:05:59,762 So to be a little bit more formal about it, let's just impose some arbitrary 105 00:05:59,762 --> 00:06:03,777 ordering on the vertices capital V, name the vertices one, two, three, all the way 106 00:06:03,777 --> 00:06:05,048 up to N. I don't care how. 107 00:06:05,048 --> 00:06:09,063 And I'm going to use the notation capital V superscript K to denote the prefix of 108 00:06:09,063 --> 00:06:11,300 the first K vertices, vertices one through K. 109 00:06:12,420 --> 00:06:17,180 So let me start now setting up the optimal substructure limit. 110 00:06:17,180 --> 00:06:20,465 In contrast to the Bellman-Ford case, I'm only going to prove the optimal 111 00:06:20,465 --> 00:06:23,657 substructure limit of four input graphs that have no negative cycle. 112 00:06:23,657 --> 00:06:27,318 We'll address the case of negative cycles after we're done with the algorithm. 113 00:06:27,318 --> 00:06:31,965 Suppose for now there isn't one. So what then are the subproblems going to 114 00:06:31,965 --> 00:06:34,002 be? Well it's going to be quite analogous to 115 00:06:34,002 --> 00:06:36,503 Bellman-Ford. In Bellman-Ford we were solving a single 116 00:06:36,503 --> 00:06:39,976 source shortest path problem so we needed to compute something for each 117 00:06:39,976 --> 00:06:42,153 destination. So that gave us a linear number of 118 00:06:42,153 --> 00:06:45,348 sub-problems and then for a given destination we had this parameter I 119 00:06:45,348 --> 00:06:48,497 controlling the edge budget. So that was another linear factor, so we 120 00:06:48,497 --> 00:06:52,155 had a quadratic number of subproblems. Here it's going to be the same except we 121 00:06:52,155 --> 00:06:55,814 also have to range over our own origin. So we're going to get a cubic number of 122 00:06:55,814 --> 00:06:59,333 subproblems, specifically a choice for the origin, N choices, a choice for the 123 00:06:59,333 --> 00:07:03,774 destination, that's N more choices plus. Which prefix of vertices we're allowed to 124 00:07:03,774 --> 00:07:06,158 use. So that's still another N choices. 125 00:07:06,158 --> 00:07:10,409 So cubic total. So now we focus on an optimal solution to 126 00:07:10,409 --> 00:07:14,384 this sub problem. By which I mean amongst all paths which 127 00:07:14,384 --> 00:07:19,754 begin at the vertex I, end at the vertex J, and strictly in between I and J, as 128 00:07:19,754 --> 00:07:23,659 internal nodes contain only vertices from one through K. 129 00:07:23,659 --> 00:07:28,192 Amongst all those paths look at the one with the shortest length. 130 00:07:28,192 --> 00:07:33,702 And because we're looking at the case of no negative cycles, we can assume that 131 00:07:33,702 --> 00:07:37,900 this path has no cycles. It's a cycle free path. 132 00:07:37,900 --> 00:07:41,713 So rather than state the conclusion of the optimal substructure lemma right now, 133 00:07:41,713 --> 00:07:45,878 let me just make sure you understand what one of these sub-problems is. 134 00:07:45,878 --> 00:07:50,160 So, lets suppose that the origin is the vertex that we've, aren't labelled 135 00:07:50,160 --> 00:07:54,677 arbitrarily seventeen, the destination J is the vertex we've labelled ten, and 136 00:07:54,677 --> 00:07:57,980 let's suppose K at the moment is only five. 137 00:07:57,980 --> 00:08:02,186 Consider the following graph. Imagine this is a little piece of some 138 00:08:02,186 --> 00:08:05,868 much bigger graph. Now I hope it's clear what the shortest 139 00:08:05,868 --> 00:08:09,863 path is between seventeen and ten. It's the bottom two hop path, that path 140 00:08:09,863 --> 00:08:13,584 has total length minus twenty. On the other hand, for the sub problem 141 00:08:13,584 --> 00:08:17,591 we're dealing with, K equals five. So what that means is we have this extra 142 00:08:17,591 --> 00:08:21,565 constraint on which vertices can we can use in the middle of our paths. 143 00:08:21,565 --> 00:08:25,699 Now to be clear, this constraint K at most five that can't apply to origin of 144 00:08:25,699 --> 00:08:26,881 the destination. Right? 145 00:08:26,881 --> 00:08:31,015 I is 17, J is 10, both of those are bigger than five, but we don't care. 146 00:08:31,015 --> 00:08:34,882 Obviously, any path from I to J, has to include both the vertices I and J. 147 00:08:34,882 --> 00:08:38,587 So the constraint only applies to vertices in the middle of the path. 148 00:08:38,587 --> 00:08:42,077 But, this bottom two hot path, unfortunately makes use of the node 149 00:08:42,077 --> 00:08:44,064 seven. So that path does not obey the 150 00:08:44,064 --> 00:08:47,340 constraint, it is not at most five, so that path is disallowed. 151 00:08:47,340 --> 00:08:50,327 Can not use it. Therefore, the shortest path from 152 00:08:50,327 --> 00:08:55,446 seventeen to ten, that makes use of only the first five The five labeled vertices 153 00:08:55,446 --> 00:09:00,275 as intermediate ones would be the top three hot path, which has the bigger 154 00:09:00,275 --> 00:09:03,987 length of three. Oay, so I hope it's clear what a c- 155 00:09:03,987 --> 00:09:07,493 sub-problem corresponds to. It corresponds to a choice of I, the 156 00:09:07,493 --> 00:09:09,998 origin. That's again just some vertex labeled 157 00:09:09,998 --> 00:09:13,225 something from one to N. Similarly there's a choice of the 158 00:09:13,225 --> 00:09:17,288 destination, some other vertex J which is labeled something from one to N. 159 00:09:17,288 --> 00:09:21,595 And then there's a choice of the bound K. Which governs which vertices you're 160 00:09:21,595 --> 00:09:26,096 permitted to use internal to your path. So again the constraint doesn't apply to 161 00:09:26,096 --> 00:09:29,416 the end points, just the vertices in between the end points. 162 00:09:29,416 --> 00:09:33,524 And in the subproblem the choice of K says you can only use vertices one 163 00:09:33,524 --> 00:09:37,856 through K, internal to your paths. So hopefully that makes sense, let's move 164 00:09:37,856 --> 00:09:41,120 on to the full statement of the optimal substructure lema. 165 00:09:42,600 --> 00:09:46,859 All right, so this is actually going to be one of the simpler optimal 166 00:09:46,859 --> 00:09:50,280 substructural lemma that we've seen in a while. 167 00:09:50,280 --> 00:09:53,313 We're back to the glory days of only two cases. 168 00:09:53,313 --> 00:09:58,348 The first is trivial, if the shortest path from I to J using only vertices one 169 00:09:58,348 --> 00:10:03,575 through K as intermediaries doesn't even bother to use the vertex K, well then it 170 00:10:03,575 --> 00:10:08,868 better well be a shortest path from I to J that uses internal nodes only from one 171 00:10:08,868 --> 00:10:12,192 to K-1. Now it's in case two it's going to be 172 00:10:12,192 --> 00:10:16,594 apparent what a great idea ordering the vertices looking at prefixes. 173 00:10:16,594 --> 00:10:21,570 The vertices is when you're considering the all pairs version of the shortest 174 00:10:21,570 --> 00:10:25,765 path problem. Suppose this path P from I to J does 175 00:10:25,765 --> 00:10:31,301 indeed use the vertex K in the middle. Well than we can think of the path P as 176 00:10:31,301 --> 00:10:35,914 comprising two sub paths. The first path P1 which starts at I and 177 00:10:35,914 --> 00:10:40,314 goes to the vertex K. And then a path P2 which starts at K and 178 00:10:40,314 --> 00:10:44,910 goes to the destination J. Now here's what's cool. 179 00:10:44,910 --> 00:10:50,438 So on this path P the internal nodes between I and J all lie in one through K. 180 00:10:50,438 --> 00:10:55,682 Moreover, this path P remembers cycle free so the vertex K appears exactly 181 00:10:55,682 --> 00:11:00,289 once, not more than once. So therefore if we split the path P into 182 00:11:00,289 --> 00:11:05,180 these two parts, P1 and P2, internal to P1, strictly in between I and K. 183 00:11:05,180 --> 00:11:07,756 There's only vertices between one and K-1. 184 00:11:07,756 --> 00:11:12,420 Similarly strictly between K and J and P2, there's only vertices between one 185 00:11:12,420 --> 00:11:15,549 through K-1. Thus both of these paths, P1 and P2 can 186 00:11:15,549 --> 00:11:19,169 be thought of as feasible solutions to smaller subproblems. 187 00:11:19,169 --> 00:11:23,709 Subproblems with an even tighter budget of K minus 1 on the internal nodes that 188 00:11:23,709 --> 00:11:27,697 are allowed and these aren't just feasible solutions for smaller 189 00:11:27,697 --> 00:11:31,520 subproblems, they're optimal solutions as well. 190 00:11:31,520 --> 00:11:35,535 So, how slick is that? That the maximum indexed, internal node 191 00:11:35,535 --> 00:11:40,259 just splits this shortest path into two shortest paths for smaller sub-problems? 192 00:11:40,259 --> 00:11:43,684 And, you know what? At this point, I am so confident in your 193 00:11:43,684 --> 00:11:48,113 facility with dynamic programming, I am not going to even bother to prove this 194 00:11:48,113 --> 00:11:51,066 statement. You are totally capable of proving this 195 00:11:51,066 --> 00:11:53,900 yourself. The argument, it's really quite similar 196 00:11:53,900 --> 00:11:57,207 to Bellman-Ford. I encourage you to do it as an exercise. 197 00:11:57,207 --> 00:11:57,680 [SOUND]