1 00:00:00,000 --> 00:00:03,620 Consider the following graph with five vertices. 2 00:00:03,620 --> 00:00:07,280 I'll label their edges with their costs in blue. 3 00:00:10,840 --> 00:00:15,165 So, let's step through the values of I, and since there's five vertices, I's 4 00:00:15,165 --> 00:00:19,666 going to take on the value of zero one two three four, and let's look at what 5 00:00:19,666 --> 00:00:22,940 the results of each round of subproblem computations is. 6 00:00:24,700 --> 00:00:29,833 So in the base case when I equals zero as always the distance from s to itself is 7 00:00:29,833 --> 00:00:34,716 zero and from all other verticies the sub problem value is defined as plus 8 00:00:34,716 --> 00:00:38,590 infinity. Let me write down the recurrence again, 9 00:00:38,590 --> 00:00:45,624 just in case you've forgotten it. So now we enter the main four loops and 10 00:00:45,624 --> 00:00:50,660 we start with I equal one. So now, we just go through the vertices 11 00:00:50,660 --> 00:00:54,180 in arbitrary order and evaluate the recurrence. 12 00:00:54,180 --> 00:00:57,977 So node S is going to just inherit the solution from the previous step. 13 00:00:57,977 --> 00:01:01,540 It's still happy with the empty path of total length zero. 14 00:01:01,540 --> 00:01:07,197 Now, note v is certainly hoping not to have to inherit it's, solution of plus 15 00:01:07,197 --> 00:01:10,063 infinity from the previous round, when I0. 16 00:01:10,063 --> 00:01:11,873 equals zero. And indeed, when I1, equals 1, the 17 00:01:11,873 --> 00:01:16,097 subproblem solution for the vertex v is going to be two. 18 00:01:16,097 --> 00:01:20,547 That's'cause we can choose the last hop to be the edge s, v. 19 00:01:20,547 --> 00:01:25,073 That has length two, and s's subproblem value, last iteration, 20 00:01:25,073 --> 00:01:30,337 when I0, equals zero. By the same reasoning, x's new subproblem 21 00:01:30,337 --> 00:01:35,754 value is going to be four cause we can choose the last half to be s coma x and 22 00:01:35,754 --> 00:01:40,897 add the cost of that edge four to s's subproblem value last iteration of I 23 00:01:40,897 --> 00:01:45,416 equals zero. Now the nodes w and t would love to throw 24 00:01:45,416 --> 00:01:48,481 out their plus infinity solutions, and get something finite. 25 00:01:48,481 --> 00:01:52,550 And you might be thinking that because v and x now have finite distances. 26 00:01:52,550 --> 00:01:54,947 Those would propagate to the nodes w and t. 27 00:01:54,947 --> 00:01:58,626 That will indeed happen, but we're going to have to wait till the next 28 00:01:58,626 --> 00:02:00,799 iteration. We're going to have to wait till I2. 29 00:02:00,799 --> 00:02:03,029 equals 2. The reason is, if you look at the code, 30 00:02:03,029 --> 00:02:07,042 if you look at the recurrence, when we compute the subproblem value at a 31 00:02:07,042 --> 00:02:09,996 given iteration i, we only make use of the subproblem 32 00:02:09,996 --> 00:02:12,337 solutions from the previous iteration, I minus 1. 33 00:02:12,337 --> 00:02:16,517 We do not use any of the updates that have already happened in the current 34 00:02:16,517 --> 00:02:18,190 iteration, I. So because, when Izero. 35 00:02:18,190 --> 00:02:23,847 equals zero, a of zero v and a of zero x were both plus infinity, we're going to 36 00:02:23,847 --> 00:02:28,420 be stuck with a1w and a1t, both being plus infinity as well. 37 00:02:29,780 --> 00:02:33,721 So now let's move on to the next iteration of the outer four loop, when I 38 00:02:33,721 --> 00:02:36,512 equals two. the sub-problem solution at S is not 39 00:02:36,512 --> 00:02:39,140 going to change. You're not going to do better than zero, 40 00:02:39,140 --> 00:02:42,534 so it's going to stay that way. Similarly at the vertex V you're not 41 00:02:42,534 --> 00:02:46,530 going to do better than two, so that's going to stay the same at this iteration. 42 00:02:46,530 --> 00:02:49,669 Something interesting happens at the vertex x, however. 43 00:02:49,669 --> 00:02:53,738 Now, in the recurrence, you certainly have the option of inheriting the 44 00:02:53,738 --> 00:02:57,168 previous solution. So one option would be to say a of two, x 45 00:02:57,168 --> 00:03:00,482 to be equal to four, but there's actually a better choice. 46 00:03:00,482 --> 00:03:04,958 So the recurrence is going to come out to be even smaller, specifically if we 47 00:03:04,958 --> 00:03:08,039 choose that last hop to be the unit cost arc from vx. 48 00:03:08,039 --> 00:03:12,399 We add that unit cost to v sub-problem value of the last iteration when I equals 49 00:03:12,399 --> 00:03:13,388 1. That was 2. 50 00:03:13,388 --> 00:03:16,760 2 plus 1 equals 3, so that's going to be the new sub-problem 51 00:03:16,760 --> 00:03:21,280 value at x, in this iteration of I equals 2. 52 00:03:21,280 --> 00:03:27,371 So as advertised, the updates to the vertices V and X in the rate, iteration 53 00:03:27,371 --> 00:03:34,023 where I equals 1, now that I equals 2 get propagated on to the vertices W and T. 54 00:03:34,023 --> 00:03:37,470 So W and T shed their plus infinity values, 55 00:03:37,470 --> 00:03:43,540 and instead they pick up the values 4 and 8, respectively. 56 00:03:43,540 --> 00:03:47,737 Notice that I've labelled the vertex t with an 8, not with a 7. 57 00:03:47,737 --> 00:03:51,934 I've computed a of 2, t to be 8. And the reason is, again, the same, 58 00:03:51,934 --> 00:03:56,071 updates from this iteration, in particular the fact that x is dropped 59 00:03:56,071 --> 00:04:00,748 from 4 to 3 do not get reflected at other nodes in this same iteration. 60 00:04:00,748 --> 00:04:05,485 We have to wait until the next iteration of the outer fore loop before they 61 00:04:05,485 --> 00:04:08,423 happen. So we're using the stale information of x 62 00:04:08,423 --> 00:04:11,181 that when I equals 1, it's solution value was four. 63 00:04:11,181 --> 00:04:16,920 That's the information we're using to update T solution value so it's 4 plus 4 64 00:04:16,920 --> 00:04:20,750 or 8. So with penultimate iteration when I 65 00:04:20,750 --> 00:04:25,663 equals 3, most things stay the same. At S and V at X and W, we actually 66 00:04:25,663 --> 00:04:30,981 computed the shorted paths, so they will all just inherent the solution from the 67 00:04:30,981 --> 00:04:36,216 previous iteration. But at vertex T, it will now make use of 68 00:04:36,216 --> 00:04:42,373 the improved solution value at the vertex X from the iteration where I equals 2. 69 00:04:42,373 --> 00:04:48,139 So it's F, its 8 gets updated to be a 7, reflecting the improvement at X, the 70 00:04:48,139 --> 00:04:52,514 previous iteration. And at this point, we're actually done. 71 00:04:52,514 --> 00:04:56,302 We've computed shortest paths to all destinations, but the algorithm doesn't 72 00:04:56,302 --> 00:04:59,592 know that we're done yet. So it still executes the final iteration 73 00:04:59,592 --> 00:05:03,381 of the outer for loop where I equals 4, but everybody just inherits their 74 00:05:03,381 --> 00:05:07,780 solutions from the previous round. At that point the algorithm terminates. 75 00:05:07,780 --> 00:05:11,426 For most of the dynamic programming algorithms that we've discussed, the 76 00:05:11,426 --> 00:05:15,529 running time analysis has been trivial. the Bellman-Ford algorithm is a little 77 00:05:15,529 --> 00:05:18,467 more interesting from a running time analysis perspective. 78 00:05:18,467 --> 00:05:29,378 Please think about it in this quiz. So, the correct answer is b. 79 00:05:29,378 --> 00:05:34,110 That is of all these running time bounds, the smallest bound, which is actually 80 00:05:34,110 --> 00:05:35,930 correct. So, let me explain why. 81 00:05:35,930 --> 00:05:40,662 Why it's the number of edges times the number of vertices and also comment on 82 00:05:40,662 --> 00:05:45,840 some of the other options. So answer A quadratic and squared. 83 00:05:45,840 --> 00:05:48,243 What this is, this is the number of sub-problems. 84 00:05:48,243 --> 00:05:50,546 Right? So sub-problems are indexed by a choice 85 00:05:50,546 --> 00:05:54,102 of I, which is somewhere between zero and n minus one, and a choice of a 86 00:05:54,102 --> 00:05:58,558 destination v, there's n choices for each of those exactly n squared sub-problems. 87 00:05:58,558 --> 00:06:02,214 If we only ever spent constant time evaluating a sub-problem, then indeed, 88 00:06:02,214 --> 00:06:05,418 the running time of Bellman-Ford would be big O of n squared. 89 00:06:05,418 --> 00:06:09,074 And in most dynamic programming algorithms we've discussed in this class, 90 00:06:09,074 --> 00:06:12,128 indeed you only spend, constant time solving each sub-problem. 91 00:06:12,128 --> 00:06:15,784 The one exception being the optimal binary search trees problem, where in 92 00:06:15,784 --> 00:06:19,138 general we spent linear time. Here again, like optimal binary search 93 00:06:19,138 --> 00:06:22,137 trees, We might spend more than constant time 94 00:06:22,137 --> 00:06:25,730 solving the sub-problem. The reason being we have to do brute 95 00:06:25,730 --> 00:06:29,913 force search through a list of candidates that might be super constant. 96 00:06:29,913 --> 00:06:34,449 The reason is that each in, each arc that comes in to the destination v, provides 97 00:06:34,449 --> 00:06:38,749 one candidate for what the correct solution to this sub-problem might be. 98 00:06:38,749 --> 00:06:43,344 So remember, the number of candidates, we had a quiz on this, is proportional to 99 00:06:43,344 --> 00:06:47,645 the n degree of the vertex v and that could be as large as n minus 1, linear, 100 00:06:47,645 --> 00:06:52,549 the number of vertices. So that's why the running time of the 101 00:06:52,549 --> 00:06:57,560 Bellman-Ford algorithm can, in general, be worse than N squared. 102 00:06:57,560 --> 00:07:00,868 Another interesting answer is c, that it's cubic in n. 103 00:07:00,868 --> 00:07:04,958 Indeed, big o of n3 cubed is a valid upper bound on the running time of the 104 00:07:04,958 --> 00:07:08,808 Bellman-Ford algorithm, but it's not the sharpest possible upper 105 00:07:08,808 --> 00:07:11,334 bound. So why is it a valid upper bound, as 106 00:07:11,334 --> 00:07:14,402 discussed just now? There's a quadratic n2 squared number of 107 00:07:14,402 --> 00:07:17,470 subproblems. How much work do you do per subproblem? 108 00:07:17,470 --> 00:07:20,537 Well, it's proportional to the n degree of a vertex. 109 00:07:20,537 --> 00:07:23,605 The biggest n degree of a vertex is n minus 1, big o of n. 110 00:07:23,605 --> 00:07:28,538 So, linear work for each of the quadratic number of subproblems, resulting in cubic 111 00:07:28,538 --> 00:07:32,579 running time. There is however a tighter, a better 112 00:07:32,579 --> 00:07:37,517 analysis of the Bellman-Ford algorithm. Now, why is o of mn bigger than n cubed? 113 00:07:37,517 --> 00:07:41,000 Well, what is m? In a sparse graph, m is going to be theta 114 00:07:41,000 --> 00:07:44,165 of n and in a dense graph m is going to be m squared. 115 00:07:44,165 --> 00:07:49,294 So with a dense graph its true big o of mn is no smaller then n cubed, but if its 116 00:07:49,294 --> 00:07:53,540 not a dense graph then this really is an improved upper bound. 117 00:07:53,540 --> 00:07:57,630 So, why does the bound hold? Well, think about all of the total amount 118 00:07:57,630 --> 00:08:01,360 of work done over all of the subproblems in the following way. 119 00:08:02,400 --> 00:08:06,872 So all we're going to do is take the amount of work done in a given iteration 120 00:08:06,872 --> 00:08:11,703 of the outer for loop, and multiply it by n, the number of iterations of the outer 121 00:08:11,703 --> 00:08:15,200 for loop. So how much work do we do in a given 122 00:08:15,200 --> 00:08:20,760 iteration for a given choice of I? well it's just the sum of the N degrees. 123 00:08:20,760 --> 00:08:24,642 When we consider the vertex V, we do our proportional to its n degree, 124 00:08:24,642 --> 00:08:28,883 and we consider each vertex V once in the given iteration of the outer for loop. 125 00:08:30,080 --> 00:08:36,010 But we know a much simpler expression for the sum of the n degrees. 126 00:08:36,010 --> 00:08:39,938 The sum is simply equal to M, the number of edges. 127 00:08:39,938 --> 00:08:45,320 In any directed graph, the number of edges is exactly equal to the sum of the 128 00:08:45,320 --> 00:08:46,557 N degrees. One easy way to see that is take your 129 00:08:46,557 --> 00:08:50,000 favorite directed graph, and imagine you plug the edges into the 130 00:08:50,000 --> 00:08:53,066 graph, one at a time, starting from the empty set of edges. 131 00:08:53,066 --> 00:08:57,100 Well, each time you plug in a new edge, obviously the number of edges in the 132 00:08:57,100 --> 00:09:00,219 graph goes up by one. And also the N degree of exactly one 133 00:09:00,219 --> 00:09:03,554 vertex goes up by one, whichever vertex happens to be the head 134 00:09:03,554 --> 00:09:07,696 of the edge that you just stuck in. So therefore the sum of N degrees and the 135 00:09:07,696 --> 00:09:12,321 sum of yet, the number of edges is always the same no matter what the directed 136 00:09:12,321 --> 00:09:14,903 graph is. So that's why the total work is better 137 00:09:14,903 --> 00:09:19,500 than N cubed. It's actually big O of M times N. 138 00:09:24,920 --> 00:09:29,209 So a number of optimizations of the basic Bellman-Ford algorithm are possible. 139 00:09:29,209 --> 00:09:32,864 Let me wrap up this video just with a quick one about stopping early. 140 00:09:32,864 --> 00:09:36,730 See also a separate video about a less trivial space optimization of the 141 00:09:36,730 --> 00:09:39,916 algorithm. The basic version of the algorithm, the 142 00:09:39,916 --> 00:09:42,490 outer for loop runs for N minus one iterations. 143 00:09:42,490 --> 00:09:46,380 Generally you don't need all of them. We already saw that in our simple 144 00:09:46,380 --> 00:09:49,283 example, where the final iteration did no useful work. 145 00:09:49,283 --> 00:09:52,570 It just inherited the solutions from the previous iteration. 146 00:09:52,570 --> 00:09:56,032 So, in general, suppose at some iteration, earlier than the last one, so 147 00:09:56,032 --> 00:09:59,784 say with the current index value of j, it just so happens that nothing changes. 148 00:09:59,784 --> 00:10:03,727 So at every destination v, you just reuse the optimal solution that you recomputed 149 00:10:03,727 --> 00:10:06,084 in the previous iteration of the outer for loop. 150 00:10:06,084 --> 00:10:09,787 Well, then if you think about it, what's going to happen in the next iteration? 151 00:10:09,787 --> 00:10:13,731 You're going to do the exactly the same set of computations with exactly the same 152 00:10:13,731 --> 00:10:17,049 set of inputs, so you're going to get exactly the same set of results. 153 00:10:17,049 --> 00:10:20,801 That is, it will be true again in the next iteration, but you will just inherit 154 00:10:20,801 --> 00:10:24,552 the optimal solutions from the previous one, and that will keep happening over 155 00:10:24,552 --> 00:10:29,140 and over again until the rest of time. So in particular, by the time you get to 156 00:10:29,140 --> 00:10:33,144 the n minus 1th iteration of the out for loop, you will have exactly the same set 157 00:10:33,144 --> 00:10:37,304 of solution values as you do right now. We've already proven that the results at 158 00:10:37,304 --> 00:10:39,384 the end of iteration n minus one are correct. 159 00:10:39,384 --> 00:10:41,620 Those are the real shortest path distances. 160 00:10:41,620 --> 00:10:45,676 If you already have them in your grubby little hands now, well you may as well 161 00:10:45,676 --> 00:10:49,420 abort the algorithm and return those as the final correct shortest path 162 00:10:49,420 --> 00:10:49,940 distances.