In this video, we'll start developing the Bellman-Ford algorithm as an instantiation of the dynamic programming paradigm. We'll develop it in the usual way, by understanding how optimal solutions must necessarily be composed of optimal solutions to smaller sub problems. So, let me just quickly review what we learned from the subtleties of the previous video. In particular, let me be precise about what I mean by solving the single source shortest path problem when the input graph can have negative edge costs. So, the input, as usual, is a directed graph. Every edge has an edge cost, S sub E. We're allowing those costs to be negative and we're given a source for text s. So, what do we want to compute? Well, ideally we'd like to compute the shortest path distance from s to all other destinations v, where as usual, the length of the path is just the sum of the edge costs. Now, in the previous video, we discussed how this goal is problematic for input graphs that have a negative cycle. If you allow cycle paths that contain cycles, then this isn't even defined. The, in some sense, shortest-path length is minus infinity. And if you don't allow cycles, then it's computationally intractable, it's an N-P hard problem. So, if you fail to compute shortest-paths, then at least tell us an excuse. Output a negative cycle in the input graph as the reason for your failure to compute shortest-path distances. So, for this video, I'm just going to develop the Bellman–Ford algorithm for input graphs that do not contain negative cycles. And, of course, these are the cases in which we better actually output the correct shortest paths. Once we've got the Bellman–Ford algorithm up and running for input graphs that do not have negative cycles, we'll see that it's a fairly easy matter to extend it to the general case. And for input graphs that have negative cycles, we'll be able to output such a cycle without affecting the running time of the basic algorithm. So, with an eye toward a dynamic programming algorithm for the shortest path problem, let's start thinking about optimal sub-structure. The way in which optimal solutions that it's shortest paths must necessarily be composed of optical solutions, that is shortest paths, to smaller sub-problems. Now, the formal optimal sub-structure lemma is going to be a little cumbersome to state, so let me just spend a sly telling you, you know, what the deal is, how to think about it. It's often tricky to figure out how to apply the dynamic programming paradigm to graph problems. One of the reasons for that is that graphs aren't really naturally sequential objects. They're not ordered in any obvious way. We're just given some unordered set of vertices and some unordered set of edges. Now, one special case, of course, would be path graphs, our very first example of dynamic programming. There, there was an obvious ordering on the vertices of the graph. But unlike say, alignments, sequences where again there's an obvious ordering from left to right, in a graph problem it's not so clear how to order things. For this particular graph problem, however, it seems like we've got something going for us, which is the sequentiality of our output, right? Paths are certainly sequential objects so that gives us hope we can state and prove an optimal sub-substructure lemma talking about how optimal, optimal solution shortest paths must be built up from, in some sense, smaller shortest paths. Unfortunately, it's still far from obvious how to really define smaller and larger subproblems. So, for example, you'd love to have some intelligent ordering on which you process the possible destinations v. But it's not at all clear how to do that without knowing the shortest path distances in the first place. So, this is a subtle issue that I encourage you to think hard about in the privacy of your own home so you can better appreciate what's non-trivial about the Bellman-Ford solution. So, the key idea in the Bellman–Ford algorithm, and it's a good one, is to introduce an additional parameter that gives us an unambiguous definition of sub-problem size. So, what this parameter is going to do for a given destination v, is it's going to control how many edges we allow in paths from the source s to this destination v that we're looking at. To explain this, let's look at the following green graph that has five vertices at the right-hand part of the slide. So, in the Bellman–Ford algorithm, we're going to have one sub-problem for each possible destination and each possible restriction on the number of edges in a path. So, for example, suppose we want, we're looking at s and we're looking at destination t and we think about paths that only have two edges or less. Well, in that case, the shortest path length subject to that constraint from s to t in this graph has a length of four. The bottom path, which has three edges is not an option, we're only permitting two edges or less in this current sub-problem. Now, if we bump up the edge budget to three, then the corresponding shortest path distance drops from four to three. Because all of a sudden, we can make use of this 3-hop path on the bottom. And again, the point here is that it gives us an unambiguous notion of sub-problem size. The more edges you are allowed to use in your paths from a source to a given destination, the bigger that sub-problem. Alright. So, that discussion was deliberately vague to give you some context for the following formal statement of the optimal substructure lemma. So, we're going to go ahead and state and prove this optimal substructure lemma in full generality. We're going to be working with arbritary input graphs. They might have a negative cycle, they might not. So, like all of these optimal substructure lemmas, the statement is going to have the form that the optimal solution to a sub-problem has to lie, has to be one of a small number of candidates that are composed in simple ways from optimal solutions to smaller sub-problems. So, how do we index a given sub-problem? Well there's going to be a destination v that we care about. And, as we said in the last slide, there's going to be a budget on how many edges you're allowed to use in a path from s to v, and we're going to use i to denote that budget. So, for every possible destination v and for every possible edge budget, there's going to be a positive integer, one or bigger. Suppose that P is an optimal solution. Meaning, amongst all paths that start at s that end at v and that contain at most i edges, amongst all of those paths, P has the minimum sum of edge cost. The minimum length. So, a subtle point. Because we're proving this limit at its full generality, that is also for input graphs that might have negative cycles, we need to go ahead and allow this path P to use cycles if it wants, including potentially to use a negative cycle multiple times. That is allowed. Notice, we're not worried about this path using a cycle an infinite number of times because we have a finite budget i on how many edges it can contain. So with that setup, what are the possible candidates for what capital P could possibly be? Well, we're going to have our usual sort of trivial case one. Which is that, if this path P doesn't bother to use up it's full edge budget, that is if it has i - 1 edges or less, well then naturally P better be the shortest s,v path that has at most i - 1 edges. So, the non-trivial case is when the shortest path with the most i edges from s to v actually uses its full budget, uses all i of its edges. So now, by analogy, with all of our previous dynamic programming algorithms where we think about plucking off the final part of an optimal solution. Here, we're going to pluck off the final edge from the path P. What do we get? Well, we get a path with one fewer edge. So, that's good. It's going to correspond to some smaller sub-problems because it has at most i - 1 edges. On the other hand, notice that if we take a path from s to v and we pluck off the final edge, we get a path from s to somewhere else. So, we're going to call that somewhere else w. So, plucking off the final edge from capital P gives us a path P prime that starts at s that ends at w that has the most i - 1 edges. And I hope you can guess that the claim is that it's not just any old path from s to w the most i - 1 edges. It is, in fact, a shortest such path. Now, in this case two, notice that we of course know that P prime has exactly i - 1 edges, not merely at most i - 1 edges. But, it's going to to be useful to have the stronger assertion claimed here that P prime is optimal amongst all paths that have i - 1 edges or possibly fewer. Alright. So, this is one of those lemmas that's actually harder to state than it is to prove. So let's just quickly, sort of, talk through the proof. It's the same as the previous optimal substructure lemmas that we've seen. Case one is totally trivial. It's the obvious contradiction that we've seen in many of our other case ones, Case two is going to be one of our usual cut-and-paste contradictions. So, suppose there is some path Q better than P prime. That is Q starts at s, ends at w, contains i - 1 edges or less, and the sum of its edge costs is even smaller than that of P prime. Well then, if we just tack on the final hop of P, that is the edge w,v, we get a path that starts at s that goes to v that has the most i edges. And the total sum of edge costs in this path Q plus w,v is strictly less than that in the original path P. But that contradicts the proported optimality of P amongst all paths starting at s ending at v and having the most i edges. So that's the proof the optimal sub-structured lemma, it's simple enough. And as usual, the next step is going to be to compile this lemma into a recurrence, where informally what the recurrence is going to do is brute force search amongst the possible candidates for the optimal solution. So, to make sure you understand what just happened, let's move onto a quiz. And the question is, for a given destination v, of some input graph, how many candidates are there for the optimal solution of a sub-problem that involves v? The correct answer is the second one. The answer depends on which destination v you're talking about. And what governs the number of sub-problems is the n degree of this vertex, that is the number of edges of the input graph that have v as their head. Why is this true? Well, case one contributes one possible candidate. It's possible that for a given i and a given v, all you do is inherit the optimal solution to destination v with a budget of one pure edge. Case two may seem like only one other candidate, but actually case two comprises a number of them, one for each choice of that final hop w,v. Specifically, for a given choice of w, that contributes a candidate optimal solution, that's the shortest path from s to that choice of w that uses the most i - 1 edges, plus that edge w,v tacked on.