1 00:00:00,000 --> 00:00:04,193 Let's now proceed to the final algorithm for the all pair shortest path problem, 2 00:00:04,193 --> 00:00:07,758 that we're going to discuss. It's called Johnson's algorithm, and the 3 00:00:07,758 --> 00:00:11,690 algorithm shows something pretty amazing. It shows that even in graphs with 4 00:00:11,690 --> 00:00:15,674 negative edge lengths, the all pair shortest path problem can effectively be 5 00:00:15,674 --> 00:00:18,400 reduced to just N indications of Dijkstra's algorithm. 6 00:00:18,400 --> 00:00:22,122 Even though Dijktsra's algorithm, for correctness, needs non negative edge 7 00:00:22,122 --> 00:00:24,429 lengths. So this video, we'll discuss the key 8 00:00:24,429 --> 00:00:27,994 idea, behind Johnson's algorithm. We'll then proceed to the algorithm 9 00:00:27,994 --> 00:00:31,978 itself, and this idea is a way of using vertex weights, to re-weight shortest 10 00:00:31,978 --> 00:00:36,763 path lengths. When we began our discussion of the all 11 00:00:36,763 --> 00:00:41,347 pair shortest path problem, we observed that the problem reduces to N indications 12 00:00:41,347 --> 00:00:45,308 of a suitable subroutine for the single source version of the problem. 13 00:00:45,308 --> 00:00:49,680 Simply by iterating over all N choices for the source vertex. 14 00:00:49,680 --> 00:00:53,351 The running time you get from this reduction is evidently N times the 15 00:00:53,351 --> 00:00:55,711 running time of the single source subroutine. 16 00:00:55,711 --> 00:01:00,117 And the running time of the subroutine is going to depend on whether your graph has 17 00:01:00,117 --> 00:01:03,736 non-negative edge links or not. If you're in the lucky case where the 18 00:01:03,736 --> 00:01:07,984 graph has non-negative edge links you get to use Dijkstra's algorithm which runs 19 00:01:07,984 --> 00:01:11,550 blazingly fast, almost linear time, M the number of edges times log N. 20 00:01:11,550 --> 00:01:15,589 If you're dealing with a graph with general edge links then the running time 21 00:01:15,589 --> 00:01:18,369 was not so good. It would be N times the Bellman-Ford 22 00:01:18,369 --> 00:01:24,209 running time which itself is M N. At the end of the day you get a running 23 00:01:24,209 --> 00:01:27,504 time of NM log N, for the non-negative edges case. 24 00:01:27,504 --> 00:01:31,760 Or N squared M for the general edge lengths case. 25 00:01:31,760 --> 00:01:35,222 And now that we know about the Floyd-Warshall algorithm which runs in 26 00:01:35,222 --> 00:01:39,143 big O of N cube times, there aren't a whole lot of reasons to care about this 27 00:01:39,143 --> 00:01:42,300 solution in general edge links that runs Bellman Ford N times. 28 00:01:42,300 --> 00:01:46,740 Except perhaps the fundamentally distributed nature of that computation. 29 00:01:46,740 --> 00:01:52,074 Johnson's out rhythm, remarkably shows that the same running time that we get 30 00:01:52,074 --> 00:01:57,685 here for the non-negative log-in case, M X N X log-in, we can get equally well for 31 00:01:57,685 --> 00:02:03,930 graphs that have general edge links. Specifically, Johnson's algorithm shows 32 00:02:03,930 --> 00:02:09,212 that the all pair shortest path problem in graphs with general edge lengths 33 00:02:09,212 --> 00:02:14,911 reduces to one invocation of the Bellman Ford algorithm followed by N invocations 34 00:02:14,911 --> 00:02:20,304 of Dijkstra's algorithm. As we know, the Bellman Ford algorithm 35 00:02:20,304 --> 00:02:24,768 runs in big O of N times N time. And indications to Dexter's algorithm is 36 00:02:24,768 --> 00:02:29,294 going to be N times M times log N. Putting it together we see that the end 37 00:02:29,294 --> 00:02:32,046 indications of Dijkstra's dominate just barely. 38 00:02:32,046 --> 00:02:36,082 So that the overall running time we achieve is big O of N M log N. 39 00:02:36,082 --> 00:02:40,914 Exactly the same running time we were getting for the non negative edge linked 40 00:02:40,914 --> 00:02:43,931 case. So this should probably sound a little 41 00:02:43,931 --> 00:02:46,841 too good to be true. How on earth can we reduce the shortest 42 00:02:46,841 --> 00:02:50,624 path problem with general edge lengths to the special case where all the edge 43 00:02:50,624 --> 00:02:53,777 lengths are non-negative? Indeed, this is a trick that we actually 44 00:02:53,777 --> 00:02:57,560 thought some about in part one, for those of you who are alumni of that course. 45 00:02:57,560 --> 00:03:01,149 Suppose you have the single source shortest path problem and you've got a 46 00:03:01,149 --> 00:03:04,884 graph with some negative edge lengths. A very natural instinct is to say, well, 47 00:03:04,884 --> 00:03:08,764 come on, that can't be that big a deal. Let's just shift the edge lengths so they 48 00:03:08,764 --> 00:03:11,286 become non-negative and then compute shortest paths. 49 00:03:11,286 --> 00:03:15,166 So if you have some graph where the most negative edge length is minus five, why 50 00:03:15,166 --> 00:03:18,307 not just add five. To the length of every single edge. 51 00:03:18,307 --> 00:03:22,023 Now you have a new graph, non-negative edge links, and you just run Dijkstra's 52 00:03:22,023 --> 00:03:26,975 algorithm, big deal. So this quiz asks you to think about this 53 00:03:26,975 --> 00:03:31,166 edge shifting trick in general. Suppose you have a graph and there's a 54 00:03:31,166 --> 00:03:34,460 particular source S in destination T you're looking at. 55 00:03:34,460 --> 00:03:39,310 And suppose you add some constant number. Capital M to the length of every single 56 00:03:39,310 --> 00:03:42,843 edge of the graph. Under what conditions will that preserve 57 00:03:42,843 --> 00:03:59,700 the shortest path between S and T? All right, so the correct answer is C. 58 00:03:59,700 --> 00:04:04,338 Of the listed conditions the only one that guarantees that adding a constant M 59 00:04:04,338 --> 00:04:08,801 to every edge length preserves the shortest path between a pair of vertices 60 00:04:08,801 --> 00:04:13,322 S and T is that all of the paths between that pair of vertices have the same 61 00:04:13,322 --> 00:04:16,728 number of edges. Maybe the easiest way to see this is just 62 00:04:16,728 --> 00:04:20,872 with a simple example. So consider this simple pink network, 63 00:04:20,872 --> 00:04:24,579 where there's an S and a T and there's two paths between them. 64 00:04:24,579 --> 00:04:28,526 One with two hops, each of cost one, one with one hop of cost three. 65 00:04:28,526 --> 00:04:32,950 Now, evidently in this graph, the shortest path is the one that goes to the 66 00:04:32,950 --> 00:04:35,761 vertex V. That has length two, the other one has 67 00:04:35,761 --> 00:04:38,691 length three. Now suppose we add a fixed constant, 68 00:04:38,691 --> 00:04:41,980 let's say the constant two to every one of these edges. 69 00:04:43,040 --> 00:04:46,779 In that case the two edges on top, their links increased to three. 70 00:04:46,779 --> 00:04:50,864 The edge on the bottom, it's link increases to five and you'll see that 71 00:04:50,864 --> 00:04:55,236 after we've shifted all of the edge lengths, we've actually changed what the 72 00:04:55,236 --> 00:04:58,401 shortest path is. It used to be the two hop path on top. 73 00:04:58,401 --> 00:05:01,220 Now it's the one hop path, hop path on the bottom. 74 00:05:01,220 --> 00:05:04,959 That has a length of five. The two hop path on top now has length 75 00:05:04,959 --> 00:05:07,761 six. So this shows that the answers a, b and d 76 00:05:07,761 --> 00:05:09,956 are all incorrect. Why is c correct? 77 00:05:09,956 --> 00:05:14,395 Well suppose it was the case that. Between the vertex S and T, every single 78 00:05:14,395 --> 00:05:17,565 path in the graph had exactly the same number of edges. 79 00:05:17,565 --> 00:05:21,542 Say, every path had twelve edges. Well, then when you add a constant to 80 00:05:21,542 --> 00:05:26,095 every edge's length, every path between S and T, their length goes up by exactly 81 00:05:26,095 --> 00:05:29,323 the same amount. It goes up by twelve times M if each of 82 00:05:29,323 --> 00:05:33,588 the paths have twelve edges in it. So if every single path between S and T 83 00:05:33,588 --> 00:05:38,141 goes up by exactly the same amount, well, then the shortest path remains exactly 84 00:05:38,141 --> 00:05:41,080 the same. The ranking of the lengths of the various 85 00:05:41,080 --> 00:05:45,691 paths is exactly the same because we've added exactly the same number to all of 86 00:05:45,691 --> 00:05:49,437 them. So the broader point here is that if we 87 00:05:49,437 --> 00:05:53,777 want to find the transformation of the lengths of a graph that reduces the 88 00:05:53,777 --> 00:05:58,116 general edge length case to the non-negative edge length case we need to 89 00:05:58,116 --> 00:06:02,158 aspire toward transformations that preserve what shortest paths are. 90 00:06:02,158 --> 00:06:07,960 And this transformation is not it. There's no reason for you to expect that 91 00:06:07,960 --> 00:06:10,895 there'd be any useful transformations of this form. 92 00:06:10,895 --> 00:06:14,925 Any shortest path preserving transformations that are non-trivial but 93 00:06:14,925 --> 00:06:18,954 such transformations do exist. That's the re-weighting technique which 94 00:06:18,954 --> 00:06:23,439 we'll start exploring in the next quiz. So in this quiz, were again going to 95 00:06:23,439 --> 00:06:25,918 think about one of our usually directed graphs. 96 00:06:25,918 --> 00:06:29,400 Every edge has a link and the links can be arbitrary real numbers. 97 00:06:30,920 --> 00:06:35,684 Here is the twist: For each vertex V, there's going to be a number, piece of V, 98 00:06:35,684 --> 00:06:39,885 associated with that vertex. Don't worry about where these piece of 99 00:06:39,885 --> 00:06:44,900 V's come from, we'll discuss that later. Just think of it as any old number; could 100 00:06:44,900 --> 00:06:48,600 be seven, could be -five, whatever; just one number per vertex. 101 00:06:50,580 --> 00:06:56,071 The key idea behind the re-weighting technique is to use these end numbers, 102 00:06:56,071 --> 00:07:01,417 one weight per vertex, p sub v, to use these end numbers to shift the edge 103 00:07:01,417 --> 00:07:05,737 lengths of the graph. I'm going to use c prime to denote these 104 00:07:05,737 --> 00:07:11,302 shifted or transformed edge lengths. Here's the exact definition: consider an 105 00:07:11,302 --> 00:07:14,231 arbitrary edge, e, of the, of this graph, g. 106 00:07:14,231 --> 00:07:18,185 Let's say the edge goes from the tail, u, to the head, v. 107 00:07:18,185 --> 00:07:23,604 The new length c prime of e is defined as the original length, ce, plus the 108 00:07:23,604 --> 00:07:26,960 weights. Of this edges tail, so plus P sub U, 109 00:07:26,960 --> 00:07:32,580 minus the weight associated with this edges head, that is minus P sub B. 110 00:07:34,020 --> 00:07:39,205 So for example, consider an edge from U to V that original length is two, and 111 00:07:39,205 --> 00:07:44,732 let's say the weights associated with its tail and head are minus four and minus 112 00:07:44,732 --> 00:07:47,666 three. Remember these vertex weights can be 113 00:07:47,666 --> 00:07:53,056 arbitrary numbers, positive or negative. Well then, the shifted weight C prime is 114 00:07:53,056 --> 00:07:57,423 going to be by definition two plus minus four minus, minus three. 115 00:07:57,423 --> 00:08:03,898 This of course is just equal to one. So that's the set up, here's the quiz 116 00:08:03,898 --> 00:08:06,880 question. I want you to think about some fixed 117 00:08:06,880 --> 00:08:11,741 path, capital P, let's say it starts at a vertex S and it ends at a vertex T. 118 00:08:11,741 --> 00:08:14,982 Now capital P may or may not be the shortest path. 119 00:08:14,982 --> 00:08:18,353 I don't care. Just any old path, starting at S, ending 120 00:08:18,353 --> 00:08:21,270 at T. So suppose, in the original network with 121 00:08:21,270 --> 00:08:26,261 the original edge links C's to B, the total length of this path P was capital 122 00:08:26,261 --> 00:08:28,660 L. What is its new link, L prime, under 123 00:08:28,660 --> 00:08:42,320 these new edge links, C prime. So the correct answer is C. 124 00:08:44,360 --> 00:08:50,007 Under the new edge lengths C prime the path P's length does not in general 125 00:08:50,007 --> 00:08:54,073 remain the same but it shifts by a predictable amount. 126 00:08:54,073 --> 00:09:00,022 Namely, the difference between the node weights associated with S, the origin of 127 00:09:00,022 --> 00:09:04,160 the path and T, the destination of the path. 128 00:09:04,160 --> 00:09:10,768 This follows by a simple calculation. So the new length of the path P is of 129 00:09:10,768 --> 00:09:15,689 course just the sum over the modified edge lengths, the sum over the C primes 130 00:09:15,689 --> 00:09:19,961 of the edges in P. So now we just expand each term in terms 131 00:09:19,961 --> 00:09:24,959 of it's definitions, or remember the new edge length C prime of E is just the 132 00:09:24,959 --> 00:09:29,309 original length C E, plus the weight associated with the edges tail. 133 00:09:29,309 --> 00:09:32,360 U minus the weight associated with it's head V. 134 00:09:33,520 --> 00:09:37,970 Now consider a vertex other than S or T on this path some internal vertex on the 135 00:09:37,970 --> 00:09:42,366 path, so such a vertex is going to be the tail of exactly one edge of P and it's 136 00:09:42,366 --> 00:09:44,893 going to be the head of exactly one edge of P. 137 00:09:44,893 --> 00:09:49,179 So its vertex weight is going to show up with a coefficient of plus one, once and 138 00:09:49,179 --> 00:09:51,982 it'll show up with a coefficient of minus one once. 139 00:09:51,982 --> 00:09:56,223 Obviously those two terms will cancel. So the only vertex weights that matter, 140 00:09:56,223 --> 00:10:00,474 when we evaluate the sum, is the weight of the origin s, and the weight of the 141 00:10:00,474 --> 00:10:03,510 destination t. The s only shows up once, and it shows up 142 00:10:03,510 --> 00:10:07,043 as a tail, so its vertex weight shows up with a +one contribution. 143 00:10:07,043 --> 00:10:09,472 So that's where the p sub s term comes from. 144 00:10:09,472 --> 00:10:13,723 Symmetrically the destination t only shows up once with a -one coefficient, so 145 00:10:13,723 --> 00:10:18,797 that's where the -t comes from. So, when the dust settles, although we're 146 00:10:18,797 --> 00:10:24,033 left with the sum of the original edge links that we're using imitation capital 147 00:10:24,033 --> 00:10:27,960 L for, plus this shift term, plus piece of S minus piece of T. 148 00:10:29,760 --> 00:10:34,806 Alright so now that we've slogged through the algebra let me explain to you why we 149 00:10:34,806 --> 00:10:39,002 went through that exercise. What is the potential application of this 150 00:10:39,002 --> 00:10:43,923 reweighting technique? So what did we just learn about the 151 00:10:43,923 --> 00:10:48,368 rewaiting technique? Well fix a graph and fix a origin vertex 152 00:10:48,368 --> 00:10:53,251 s and a destination vertex t. What we just learned is that when you 153 00:10:53,251 --> 00:10:58,133 rewait, using these vertex waits. You shift every single s, t path by 154 00:10:58,133 --> 00:11:02,651 exactly the same amount. The length of every s, t path, shortest 155 00:11:02,651 --> 00:11:06,441 or otherwise. Changes by exactly the quantity p of s 156 00:11:06,441 --> 00:11:10,765 minus piece of t. So this is now starting to sound pretty 157 00:11:10,765 --> 00:11:12,861 cool, cause let's remember our first quiz. 158 00:11:12,861 --> 00:11:16,748 In our first quiz, we explored what happens when you just add a fixed amount 159 00:11:16,748 --> 00:11:20,549 capital M to every edge of the graph. We notice it doesn't work, in the sense 160 00:11:20,549 --> 00:11:24,122 that it can change the shortest path, why can it change the shortest path? 161 00:11:24,122 --> 00:11:27,989 Well different paths have a different number of edges so if you add some fixed 162 00:11:27,989 --> 00:11:31,759 amount to each edge you change different paths by different amounts, that can 163 00:11:31,759 --> 00:11:34,891 screw up the shortest path. If you're lucky and all of the paths 164 00:11:34,891 --> 00:11:38,808 between and S and a T have exactly the same number of edges, then you shift them 165 00:11:38,808 --> 00:11:42,430 all by exactly the same amount and you preserve the ordering of the paths. 166 00:11:42,430 --> 00:11:45,220 But what do we see here? We see that under no assumptions 167 00:11:45,220 --> 00:11:48,990 whatsoever about what the paths between S and T like, node reweighting shifts. 168 00:11:48,990 --> 00:11:54,952 Every single path from S to T by exactly the same quantity, the difference between 169 00:11:54,952 --> 00:11:59,448 PS and PT. Well since reweighting, changing the 170 00:11:59,448 --> 00:12:04,322 length of every path by exactly the same amount, that means it preserves the 171 00:12:04,322 --> 00:12:07,208 shortest path. Whatever was the shortest path 172 00:12:07,208 --> 00:12:12,080 previously, is the shortest path after we've done reweighting. 173 00:12:12,080 --> 00:12:15,457 So what, why is this interesting? Well lets, cream big. 174 00:12:15,457 --> 00:12:20,197 Suppose we could actually come up, with vertex weights, that, transform the 175 00:12:20,197 --> 00:12:25,458 problem, into one that we don't know how to solve blazingly fast, into one that we 176 00:12:25,458 --> 00:12:30,329 do know how to solve blazingly fast. That is, what if we can find the vertex 177 00:12:30,329 --> 00:12:34,940 weights that change, an arbitrary instance, that in general has, negative 178 00:12:34,940 --> 00:12:39,875 edge lengths, into, a shortest path problem where all of the edge lengths are 179 00:12:39,875 --> 00:12:44,171 now, non negative. So if such a transformation existed, I 180 00:12:44,171 --> 00:12:47,748 hope that it's use would be clear. This would transform an instance of 181 00:12:47,748 --> 00:12:51,684 shortest paths, where it seems like we're stuck with the Bellman-Ford 182 00:12:51,684 --> 00:12:54,342 algorithm'cause we start with negative edge lengths. 183 00:12:54,342 --> 00:12:58,124 And it transforms it into an easier one where can we can apply Dijkstra's 184 00:12:58,124 --> 00:13:02,943 shortest path algorithm. Now this best case scenario should sound 185 00:13:02,943 --> 00:13:06,145 like a free lunch to you. Why do we think we can do basically a 186 00:13:06,145 --> 00:13:10,160 trivial amount of work, just a simple shift of the edge lengths and transform a 187 00:13:10,160 --> 00:13:14,277 seemingly harder problem, shortest paths with general edge lengths, into an easier 188 00:13:14,277 --> 00:13:16,157 one? Shortest paths with non-negative 189 00:13:16,157 --> 00:13:18,546 influence. The catch, of course, is to figure out 190 00:13:18,546 --> 00:13:22,052 which vertex weights to use. How do you know the magical edge weights 191 00:13:22,052 --> 00:13:25,457 that transform the general edge lengths into the non-negative ones? 192 00:13:25,457 --> 00:13:29,269 Actually, more fundamentally, why do we even think that these vertex weights 193 00:13:29,269 --> 00:13:31,607 exist? Maybe no matter how you pick the vertex 194 00:13:31,607 --> 00:13:34,707 weights, it's impossible to transform all of the edge lengths. 195 00:13:34,707 --> 00:13:36,604 So. That they become non negative. 196 00:13:36,604 --> 00:13:41,138 Well it turns out, as long as the graph has no negative cycle, which we know is 197 00:13:41,138 --> 00:13:45,499 essential to compute shortest paths. If there is no negative cycle there do 198 00:13:45,499 --> 00:13:49,220 always exist magical choices of the vertex weights P sub V that. 199 00:13:49,220 --> 00:13:52,254 Transforms all of the edge links to B non-negative. 200 00:13:52,254 --> 00:13:55,944 Now it's not trivial to compute them, you have to do some work. 201 00:13:55,944 --> 00:13:58,443 But, it's not a prohibitive amount of work. 202 00:13:58,443 --> 00:14:01,169 In fact. One invocation of the Bellman Ford 203 00:14:01,169 --> 00:14:05,410 algorithm will be sufficient. Johnson's algorithm puts those ideas 204 00:14:05,410 --> 00:14:07,660 together. That'll be the next video.