In this video, we'll start developing the 
Bellman-Ford algorithm as an 
instantiation of the dynamic programming 
paradigm. 
We'll develop it in the usual way, by 
understanding how optimal solutions must 
necessarily be composed of optimal 
solutions to smaller sub problems. 
So, let me just quickly review what we 
learned from the subtleties of the 
previous video. 
In particular, let me be precise about 
what I mean by solving the single source 
shortest path problem when the input 
graph can have negative edge costs. 
So, the input, as usual, is a directed 
graph. 
Every edge has an edge cost, S sub E. 
We're allowing those costs to be negative 
and we're given a source for text s. 
So, what do we want to compute? 
Well, ideally we'd like to compute the 
shortest path distance from s to all 
other destinations v, where as usual, the 
length of the path is just the sum of the 
edge costs. 
Now, in the previous video, we discussed 
how this goal is problematic for input 
graphs that have a negative cycle. 
If you allow cycle paths that contain 
cycles, then this isn't even defined. 
The, in some sense, shortest-path length 
is minus infinity. 
And if you don't allow cycles, then it's 
computationally intractable, it's an N-P 
hard problem. 
So, if you fail to compute 
shortest-paths, then at least tell us an 
excuse. 
Output a negative cycle in the input 
graph as the reason for your failure to 
compute shortest-path distances. 
So, for this video, I'm just going to 
develop the Bellman–Ford algorithm for 
input graphs that do not contain negative 
cycles. 
And, of course, these are the cases in 
which we better actually output the 
correct shortest paths. 
Once we've got the Bellman–Ford algorithm 
up and running for input graphs that do 
not have negative cycles, we'll see that 
it's a fairly easy matter to extend it to 
the general case. 
And for input graphs that have negative 
cycles, we'll be able to output such a 
cycle without affecting the running time 
of the basic algorithm. 
So, with an eye toward a dynamic 
programming algorithm for the shortest 
path problem, let's start thinking about 
optimal sub-structure. 
The way in which optimal solutions that 
it's shortest paths must necessarily be 
composed of optical solutions, that is 
shortest paths, to smaller sub-problems. 
Now, the formal optimal sub-structure 
lemma is going to be a little cumbersome 
to state, so let me just spend a sly 
telling you, you know, what the deal is, 
how to think about it. 
It's often tricky to figure out how to 
apply the dynamic programming paradigm to 
graph problems. 
One of the reasons for that is that 
graphs aren't really naturally sequential 
objects. 
They're not ordered in any obvious way. 
We're just given some unordered set of 
vertices and some unordered set of edges. 
Now, one special case, of course, would 
be path graphs, our very first example of 
dynamic programming. 
There, there was an obvious ordering on 
the vertices of the graph. 
But unlike say, alignments, sequences 
where again there's an obvious ordering 
from left to right, in a graph problem 
it's not so clear how to order things. 
For this particular graph problem, 
however, it seems like we've got 
something going for us, which is the 
sequentiality of our output, 
right? 
Paths are certainly sequential objects so 
that gives us hope we can state and prove 
an optimal sub-substructure lemma talking 
about how optimal, optimal solution 
shortest paths must be built up from, in 
some sense, smaller shortest paths. 
Unfortunately, it's still far from 
obvious how to really define smaller and 
larger subproblems. 
So, for example, you'd love to have some 
intelligent ordering on which you process 
the possible destinations v. 
But it's not at all clear how to do that 
without knowing the shortest path 
distances in the first place. 
So, this is a subtle issue that I 
encourage you to think hard about in the 
privacy of your own home so you can 
better appreciate what's non-trivial 
about the Bellman-Ford solution. 
So, the key idea in the Bellman–Ford 
algorithm, and it's a good one, is to 
introduce an additional parameter that 
gives us an unambiguous definition of 
sub-problem size. 
So, what this parameter is going to do 
for a given destination v, is it's going 
to control how many edges we allow in 
paths from the source s to this 
destination v that we're looking at. 
To explain this, let's look at the 
following green graph that has five 
vertices at the right-hand part of the 
slide. 
So, in the Bellman–Ford algorithm, we're 
going to have one sub-problem for each 
possible destination and each possible 
restriction on the number of edges in a 
path. 
So, for example, suppose we want, we're 
looking at s and we're looking at 
destination t and we think about paths 
that only have two edges or less. 
Well, in that case, the shortest path 
length subject to that constraint from s 
to t in this graph has a length of four. 
The bottom path, which has three edges is 
not an option, we're only permitting two 
edges or less in this current 
sub-problem. 
Now, if we bump up the edge budget to 
three, then the corresponding shortest 
path distance drops from four to three. 
Because all of a sudden, we can make use 
of this 3-hop path on the bottom. 
And again, the point here is that it 
gives us an unambiguous notion of 
sub-problem size. 
The more edges you are allowed to use in 
your paths from a source to a given 
destination, the bigger that sub-problem. 
Alright. 
So, that discussion was deliberately 
vague to give you some context for the 
following formal statement of the optimal 
substructure lemma. 
So, we're going to go ahead and state and 
prove this optimal substructure lemma in 
full generality. 
We're going to be working with arbritary 
input graphs. 
They might have a negative cycle, they 
might not. 
So, like all of these optimal 
substructure lemmas, the statement is 
going to have the form that the optimal 
solution to a sub-problem has to lie, has 
to be one of a small number of candidates 
that are composed in simple ways from 
optimal solutions to smaller 
sub-problems. 
So, how do we index a given sub-problem? 
Well there's going to be a destination v 
that we care about. 
And, as we said in the last slide, 
there's going to be a budget on how many 
edges you're allowed to use in a path 
from s to v, and we're going to use i to 
denote that budget. 
So, for every possible destination v and 
for every possible edge budget, 
there's going to be a positive integer, 
one or bigger. 
Suppose that P is an optimal solution. 
Meaning, amongst all paths that start at 
s that end at v and that contain at most 
i edges, 
amongst all of those paths, P has the 
minimum sum of edge cost. 
The minimum length. 
So, a subtle point. Because we're proving 
this limit at its full generality, that 
is also for input graphs that might have 
negative cycles, we need to go ahead and 
allow this path P to use cycles if it 
wants, 
including potentially to use a negative 
cycle multiple times. 
That is allowed. 
Notice, we're not worried about this path 
using a cycle an infinite number of times 
because we have a finite budget i on how 
many edges it can contain. 
So with that setup, what are the possible 
candidates for what capital P could 
possibly be? 
Well, we're going to have our usual sort 
of trivial case one. 
Which is that, if this path P doesn't 
bother to use up it's full edge budget, 
that is if it has i - 1 edges or less, 
well then naturally P better be the 
shortest s,v path that has at most i - 1 
edges. 
So, the non-trivial case is when the 
shortest path with the most i edges from 
s to v actually uses its full budget, 
uses all i of its edges. 
So now, by analogy, with all of our 
previous dynamic programming algorithms 
where we think about plucking off the 
final part of an optimal solution. 
Here, we're going to pluck off the final 
edge from the path P. 
What do we get? 
Well, we get a path with one fewer edge. 
So, that's good. 
It's going to correspond to some smaller 
sub-problems because it has at most i - 1 
edges. 
On the other hand, notice that if we take 
a path from s to v and we pluck off the 
final edge, 
we get a path from s to somewhere else. 
So, we're going to call that somewhere 
else w. 
So, plucking off the final edge from 
capital P gives us a path P prime that 
starts at s that ends at w that has the 
most i - 1 edges. 
And I hope you can guess that the claim 
is that it's not just any old path from s 
to w the most i - 1 edges. 
It is, in fact, a shortest such path. 
Now, in this case two, notice that we of 
course know that P prime has exactly i - 
1 edges, not merely at most i - 1 edges. 
But, it's going to to be useful to have 
the stronger assertion claimed here that 
P prime is optimal amongst all paths that 
have i - 1 edges or possibly fewer. 
Alright. So, this is one of those lemmas 
that's actually harder to state than it 
is to prove. 
So let's just quickly, sort of, talk 
through the proof. 
It's the same as the previous optimal 
substructure lemmas that we've seen. 
Case one is totally trivial. It's the 
obvious contradiction that we've seen in 
many of our other case ones, 
Case two is going to be one of our usual 
cut-and-paste contradictions. 
So, suppose there is some path Q better 
than P prime. 
That is Q starts at s, ends at w, 
contains i - 1 edges or less, and the sum 
of its edge costs is even smaller than 
that of P prime. 
Well then, if we just tack on the final 
hop of P, that is the edge w,v, we get a 
path that starts at s that goes to v that 
has the most i edges. 
And the total sum of edge costs in this 
path Q plus w,v is strictly less than 
that in the original path P. 
But that contradicts the proported 
optimality of P amongst all paths 
starting at s ending at v and having the 
most i edges. 
So that's the proof the optimal 
sub-structured lemma, it's simple enough. 
And as usual, the next step is going to 
be to compile this lemma into a 
recurrence, where informally what the 
recurrence is going to do is brute force 
search amongst the possible candidates 
for the optimal solution. 
So, to make sure you understand what just 
happened, let's move onto a quiz. 
And the question is, for a given 
destination v, of some input graph, how 
many candidates are there for the optimal 
solution of a sub-problem that involves 
v? 
The correct answer is the second one. 
The answer depends on which destination v 
you're talking about. 
And what governs the number of 
sub-problems is the n degree of this 
vertex, 
that is the number of edges of the input 
graph that have v as their head. 
Why is this true? 
Well, case one contributes one possible 
candidate. 
It's possible that for a given i and a 
given v, all you do is inherit the 
optimal solution to destination v with a 
budget of one pure edge. 
Case two may seem like only one other 
candidate, but actually case two 
comprises a number of them, one for each 
choice of that final hop w,v. 
Specifically, for a given choice of w, 
that contributes a candidate optimal 
solution, that's the shortest path from s 
to that choice of w that uses the most i 
- 1 edges, plus that edge w,v tacked on.