1
00:00:00,000 --> 00:00:03,964
In this video, we'll start developing the 
Bellman-Ford algorithm as an 

2
00:00:03,964 --> 00:00:06,795
instantiation of the dynamic programming 
paradigm. 

3
00:00:06,795 --> 00:00:11,212
We'll develop it in the usual way, by 
understanding how optimal solutions must 

4
00:00:11,212 --> 00:00:16,060
necessarily be composed of optimal 
solutions to smaller sub problems. 

5
00:00:16,060 --> 00:00:20,348
So, let me just quickly review what we 
learned from the subtleties of the 

6
00:00:20,348 --> 00:00:23,461
previous video. 
In particular, let me be precise about 

7
00:00:23,461 --> 00:00:28,043
what I mean by solving the single source 
shortest path problem when the input 

8
00:00:28,043 --> 00:00:33,272
graph can have negative edge costs. 
So, the input, as usual, is a directed 

9
00:00:33,272 --> 00:00:36,110
graph. 
Every edge has an edge cost, S sub E. 

10
00:00:36,110 --> 00:00:41,900
We're allowing those costs to be negative 
and we're given a source for text s. 

11
00:00:41,900 --> 00:00:45,609
So, what do we want to compute? 
Well, ideally we'd like to compute the 

12
00:00:45,609 --> 00:00:49,809
shortest path distance from s to all 
other destinations v, where as usual, the 

13
00:00:49,809 --> 00:00:53,620
length of the path is just the sum of the 
edge costs. 

14
00:00:53,620 --> 00:00:58,112
Now, in the previous video, we discussed 
how this goal is problematic for input 

15
00:00:58,112 --> 00:01:02,259
graphs that have a negative cycle. 
If you allow cycle paths that contain 

16
00:01:02,259 --> 00:01:06,579
cycles, then this isn't even defined. 
The, in some sense, shortest-path length 

17
00:01:06,579 --> 00:01:09,919
is minus infinity. 
And if you don't allow cycles, then it's 

18
00:01:09,919 --> 00:01:12,972
computationally intractable, it's an N-P 
hard problem. 

19
00:01:12,972 --> 00:01:16,773
So, if you fail to compute 
shortest-paths, then at least tell us an 

20
00:01:16,773 --> 00:01:19,308
excuse. 
Output a negative cycle in the input 

21
00:01:19,308 --> 00:01:23,860
graph as the reason for your failure to 
compute shortest-path distances. 

22
00:01:23,860 --> 00:01:27,671
So, for this video, I'm just going to 
develop the Bellman–Ford algorithm for 

23
00:01:27,671 --> 00:01:30,161
input graphs that do not contain negative 
cycles. 

24
00:01:30,161 --> 00:01:33,871
And, of course, these are the cases in 
which we better actually output the 

25
00:01:33,871 --> 00:01:38,109
correct shortest paths. 
Once we've got the Bellman–Ford algorithm 

26
00:01:38,109 --> 00:01:42,394
up and running for input graphs that do 
not have negative cycles, we'll see that 

27
00:01:42,394 --> 00:01:45,554
it's a fairly easy matter to extend it to 
the general case. 

28
00:01:45,554 --> 00:01:49,732
And for input graphs that have negative 
cycles, we'll be able to output such a 

29
00:01:49,732 --> 00:01:53,580
cycle without affecting the running time 
of the basic algorithm. 

30
00:01:53,580 --> 00:01:57,273
So, with an eye toward a dynamic 
programming algorithm for the shortest 

31
00:01:57,273 --> 00:02:00,498
path problem, let's start thinking about 
optimal sub-structure. 

32
00:02:00,498 --> 00:02:04,763
The way in which optimal solutions that 
it's shortest paths must necessarily be 

33
00:02:04,763 --> 00:02:08,769
composed of optical solutions, that is 
shortest paths, to smaller sub-problems. 

34
00:02:08,769 --> 00:02:13,034
Now, the formal optimal sub-structure 
lemma is going to be a little cumbersome 

35
00:02:13,034 --> 00:02:16,415
to state, so let me just spend a sly 
telling you, you know, what the deal is, 

36
00:02:16,415 --> 00:02:20,551
how to think about it. 
It's often tricky to figure out how to 

37
00:02:20,551 --> 00:02:23,519
apply the dynamic programming paradigm to 
graph problems. 

38
00:02:23,519 --> 00:02:27,581
One of the reasons for that is that 
graphs aren't really naturally sequential 

39
00:02:27,581 --> 00:02:30,029
objects. 
They're not ordered in any obvious way. 

40
00:02:30,029 --> 00:02:34,195
We're just given some unordered set of 
vertices and some unordered set of edges. 

41
00:02:34,195 --> 00:02:38,257
Now, one special case, of course, would 
be path graphs, our very first example of 

42
00:02:38,257 --> 00:02:41,330
dynamic programming. 
There, there was an obvious ordering on 

43
00:02:41,330 --> 00:02:44,558
the vertices of the graph. 
But unlike say, alignments, sequences 

44
00:02:44,558 --> 00:02:48,621
where again there's an obvious ordering 
from left to right, in a graph problem 

45
00:02:48,621 --> 00:02:52,748
it's not so clear how to order things. 
For this particular graph problem, 

46
00:02:52,748 --> 00:02:56,406
however, it seems like we've got 
something going for us, which is the 

47
00:02:56,406 --> 00:02:58,235
sequentiality of our output, 
right? 

48
00:02:58,235 --> 00:03:02,753
Paths are certainly sequential objects so 
that gives us hope we can state and prove 

49
00:03:02,753 --> 00:03:06,679
an optimal sub-substructure lemma talking 
about how optimal, optimal solution 

50
00:03:06,679 --> 00:03:10,660
shortest paths must be built up from, in 
some sense, smaller shortest paths. 

51
00:03:11,920 --> 00:03:15,540
Unfortunately, it's still far from 
obvious how to really define smaller and 

52
00:03:15,540 --> 00:03:18,340
larger subproblems. 
So, for example, you'd love to have some 

53
00:03:18,340 --> 00:03:21,719
intelligent ordering on which you process 
the possible destinations v. 

54
00:03:21,719 --> 00:03:25,339
But it's not at all clear how to do that 
without knowing the shortest path 

55
00:03:25,339 --> 00:03:28,332
distances in the first place. 
So, this is a subtle issue that I 

56
00:03:28,332 --> 00:03:32,049
encourage you to think hard about in the 
privacy of your own home so you can 

57
00:03:32,049 --> 00:03:35,660
better appreciate what's non-trivial 
about the Bellman-Ford solution. 

58
00:03:35,660 --> 00:03:39,970
So, the key idea in the Bellman–Ford 
algorithm, and it's a good one, is to 

59
00:03:39,970 --> 00:03:44,337
introduce an additional parameter that 
gives us an unambiguous definition of 

60
00:03:44,337 --> 00:03:47,457
sub-problem size. 
So, what this parameter is going to do 

61
00:03:47,457 --> 00:03:51,880
for a given destination v, is it's going 
to control how many edges we allow in 

62
00:03:51,880 --> 00:03:55,737
paths from the source s to this 
destination v that we're looking at. 

63
00:03:55,737 --> 00:03:59,707
To explain this, let's look at the 
following green graph that has five 

64
00:03:59,707 --> 00:04:02,260
vertices at the right-hand part of the 
slide. 

65
00:04:03,440 --> 00:04:07,622
So, in the Bellman–Ford algorithm, we're 
going to have one sub-problem for each 

66
00:04:07,622 --> 00:04:11,859
possible destination and each possible 
restriction on the number of edges in a 

67
00:04:11,859 --> 00:04:14,111
path. 
So, for example, suppose we want, we're 

68
00:04:14,111 --> 00:04:18,026
looking at s and we're looking at 
destination t and we think about paths 

69
00:04:18,026 --> 00:04:21,727
that only have two edges or less. 
Well, in that case, the shortest path 

70
00:04:21,727 --> 00:04:26,070
length subject to that constraint from s 
to t in this graph has a length of four. 

71
00:04:26,070 --> 00:04:30,414
The bottom path, which has three edges is 
not an option, we're only permitting two 

72
00:04:30,414 --> 00:04:32,667
edges or less in this current 
sub-problem. 

73
00:04:32,667 --> 00:04:36,689
Now, if we bump up the edge budget to 
three, then the corresponding shortest 

74
00:04:36,689 --> 00:04:40,956
path distance drops from four to three. 
Because all of a sudden, we can make use 

75
00:04:40,956 --> 00:04:44,688
of this 3-hop path on the bottom. 
And again, the point here is that it 

76
00:04:44,688 --> 00:04:47,447
gives us an unambiguous notion of 
sub-problem size. 

77
00:04:47,447 --> 00:04:51,783
The more edges you are allowed to use in 
your paths from a source to a given 

78
00:04:51,783 --> 00:04:55,767
destination, the bigger that sub-problem. 
Alright. 

79
00:04:55,767 --> 00:04:59,872
So, that discussion was deliberately 
vague to give you some context for the 

80
00:04:59,872 --> 00:05:03,960
following formal statement of the optimal 
substructure lemma. 

81
00:05:03,960 --> 00:05:07,770
So, we're going to go ahead and state and 
prove this optimal substructure lemma in 

82
00:05:07,770 --> 00:05:10,325
full generality. 
We're going to be working with arbritary 

83
00:05:10,325 --> 00:05:12,611
input graphs. 
They might have a negative cycle, they 

84
00:05:12,611 --> 00:05:15,808
might not. 
So, like all of these optimal 

85
00:05:15,808 --> 00:05:19,654
substructure lemmas, the statement is 
going to have the form that the optimal 

86
00:05:19,654 --> 00:05:23,799
solution to a sub-problem has to lie, has 
to be one of a small number of candidates 

87
00:05:23,799 --> 00:05:27,145
that are composed in simple ways from 
optimal solutions to smaller 

88
00:05:27,145 --> 00:05:29,742
sub-problems. 
So, how do we index a given sub-problem? 

89
00:05:29,742 --> 00:05:32,739
Well there's going to be a destination v 
that we care about. 

90
00:05:32,739 --> 00:05:36,435
And, as we said in the last slide, 
there's going to be a budget on how many 

91
00:05:36,435 --> 00:05:40,331
edges you're allowed to use in a path 
from s to v, and we're going to use i to 

92
00:05:40,331 --> 00:05:44,304
denote that budget. 
So, for every possible destination v and 

93
00:05:44,304 --> 00:05:47,808
for every possible edge budget, 
there's going to be a positive integer, 

94
00:05:47,808 --> 00:05:51,799
one or bigger. 
Suppose that P is an optimal solution. 

95
00:05:51,799 --> 00:05:57,936
Meaning, amongst all paths that start at 
s that end at v and that contain at most 

96
00:05:57,936 --> 00:06:01,346
i edges, 
amongst all of those paths, P has the 

97
00:06:01,346 --> 00:06:05,220
minimum sum of edge cost. 
The minimum length. 

98
00:06:05,220 --> 00:06:09,888
So, a subtle point. Because we're proving 
this limit at its full generality, that 

99
00:06:09,888 --> 00:06:14,675
is also for input graphs that might have 
negative cycles, we need to go ahead and 

100
00:06:14,675 --> 00:06:16,921
allow this path P to use cycles if it 
wants, 

101
00:06:16,921 --> 00:06:20,289
including potentially to use a negative 
cycle multiple times. 

102
00:06:20,289 --> 00:06:23,658
That is allowed. 
Notice, we're not worried about this path 

103
00:06:23,658 --> 00:06:28,563
using a cycle an infinite number of times 
because we have a finite budget i on how 

104
00:06:28,563 --> 00:06:34,169
many edges it can contain. 
So with that setup, what are the possible 

105
00:06:34,169 --> 00:06:37,247
candidates for what capital P could 
possibly be? 

106
00:06:37,247 --> 00:06:41,095
Well, we're going to have our usual sort 
of trivial case one. 

107
00:06:41,095 --> 00:06:45,970
Which is that, if this path P doesn't 
bother to use up it's full edge budget, 

108
00:06:45,970 --> 00:06:51,100
that is if it has i - 1 edges or less, 
well then naturally P better be the 

109
00:06:51,100 --> 00:06:54,500
shortest s,v path that has at most i - 1 
edges. 

110
00:06:55,840 --> 00:07:00,096
So, the non-trivial case is when the 
shortest path with the most i edges from 

111
00:07:00,096 --> 00:07:03,467
s to v actually uses its full budget, 
uses all i of its edges. 

112
00:07:03,467 --> 00:07:07,557
So now, by analogy, with all of our 
previous dynamic programming algorithms 

113
00:07:07,557 --> 00:07:11,537
where we think about plucking off the 
final part of an optimal solution. 

114
00:07:11,537 --> 00:07:14,743
Here, we're going to pluck off the final 
edge from the path P. 

115
00:07:14,743 --> 00:07:17,728
What do we get? 
Well, we get a path with one fewer edge. 

116
00:07:17,728 --> 00:07:20,657
So, that's good. 
It's going to correspond to some smaller 

117
00:07:20,657 --> 00:07:23,034
sub-problems because it has at most i - 1 
edges. 

118
00:07:23,034 --> 00:07:27,511
On the other hand, notice that if we take 
a path from s to v and we pluck off the 

119
00:07:27,511 --> 00:07:30,570
final edge, 
we get a path from s to somewhere else. 

120
00:07:30,570 --> 00:07:33,280
So, we're going to call that somewhere 
else w. 

121
00:07:34,540 --> 00:07:38,825
So, plucking off the final edge from 
capital P gives us a path P prime that 

122
00:07:38,825 --> 00:07:42,425
starts at s that ends at w that has the 
most i - 1 edges. 

123
00:07:42,425 --> 00:07:47,111
And I hope you can guess that the claim 
is that it's not just any old path from s 

124
00:07:47,111 --> 00:07:51,540
to w the most i - 1 edges. 
It is, in fact, a shortest such path. 

125
00:07:51,540 --> 00:07:56,141
Now, in this case two, notice that we of 
course know that P prime has exactly i - 

126
00:07:56,141 --> 00:08:00,633
1 edges, not merely at most i - 1 edges. 
But, it's going to to be useful to have 

127
00:08:00,633 --> 00:08:05,126
the stronger assertion claimed here that 
P prime is optimal amongst all paths that 

128
00:08:05,126 --> 00:08:10,074
have i - 1 edges or possibly fewer. 
Alright. So, this is one of those lemmas 

129
00:08:10,074 --> 00:08:12,440
that's actually harder to state than it 
is to prove. 

130
00:08:12,440 --> 00:08:14,852
So let's just quickly, sort of, talk 
through the proof. 

131
00:08:14,852 --> 00:08:18,960
It's the same as the previous optimal 
substructure lemmas that we've seen. 

132
00:08:18,960 --> 00:08:23,684
Case one is totally trivial. It's the 
obvious contradiction that we've seen in 

133
00:08:23,684 --> 00:08:27,803
many of our other case ones, 
Case two is going to be one of our usual 

134
00:08:27,803 --> 00:08:32,900
cut-and-paste contradictions. 
So, suppose there is some path Q better 

135
00:08:32,900 --> 00:08:36,225
than P prime. 
That is Q starts at s, ends at w, 

136
00:08:36,225 --> 00:08:42,578
contains i - 1 edges or less, and the sum 
of its edge costs is even smaller than 

137
00:08:42,578 --> 00:08:47,399
that of P prime. 
Well then, if we just tack on the final 

138
00:08:47,399 --> 00:08:52,714
hop of P, that is the edge w,v, we get a 
path that starts at s that goes to v that 

139
00:08:52,714 --> 00:08:56,717
has the most i edges. 
And the total sum of edge costs in this 

140
00:08:56,717 --> 00:09:01,048
path Q plus w,v is strictly less than 
that in the original path P. 

141
00:09:01,048 --> 00:09:05,576
But that contradicts the proported 
optimality of P amongst all paths 

142
00:09:05,576 --> 00:09:09,900
starting at s ending at v and having the 
most i edges. 

143
00:09:09,900 --> 00:09:13,617
So that's the proof the optimal 
sub-structured lemma, it's simple enough. 

144
00:09:13,617 --> 00:09:17,129
And as usual, the next step is going to 
be to compile this lemma into a 

145
00:09:17,129 --> 00:09:21,157
recurrence, where informally what the 
recurrence is going to do is brute force 

146
00:09:21,157 --> 00:09:24,462
search amongst the possible candidates 
for the optimal solution. 

147
00:09:24,462 --> 00:09:27,921
So, to make sure you understand what just 
happened, let's move onto a quiz. 

148
00:09:27,921 --> 00:09:34,270
And the question is, for a given 
destination v, of some input graph, how 

149
00:09:34,270 --> 00:09:40,935
many candidates are there for the optimal 
solution of a sub-problem that involves 

150
00:09:40,935 --> 00:09:49,670
v? 
The correct answer is the second one. 

151
00:09:49,670 --> 00:09:53,776
The answer depends on which destination v 
you're talking about. 

152
00:09:53,776 --> 00:09:58,207
And what governs the number of 
sub-problems is the n degree of this 

153
00:09:58,207 --> 00:10:01,334
vertex, 
that is the number of edges of the input 

154
00:10:01,334 --> 00:10:05,717
graph that have v as their head. 
Why is this true? 

155
00:10:05,717 --> 00:10:08,823
Well, case one contributes one possible 
candidate. 

156
00:10:08,823 --> 00:10:13,450
It's possible that for a given i and a 
given v, all you do is inherit the 

157
00:10:13,450 --> 00:10:18,600
optimal solution to destination v with a 
budget of one pure edge. 

158
00:10:18,600 --> 00:10:23,278
Case two may seem like only one other 
candidate, but actually case two 

159
00:10:23,278 --> 00:10:28,291
comprises a number of them, one for each 
choice of that final hop w,v. 

160
00:10:28,291 --> 00:10:33,238
Specifically, for a given choice of w, 
that contributes a candidate optimal 

161
00:10:33,238 --> 00:10:38,719
solution, that's the shortest path from s 
to that choice of w that uses the most i 

162
00:10:38,719 --> 00:10:41,660
- 1 edges, plus that edge w,v tacked on.