1
00:00:00,000 --> 00:00:04,009
So we've talked a bit about various 
running time optimizations of the basic 

2
00:00:04,009 --> 00:00:07,228
Bellman-Ford algorithm. 
Now let's talk about optimizing the space 

3
00:00:07,228 --> 00:00:11,260
required by the algorithm. 
To orient ourselves, lets just think 

4
00:00:11,260 --> 00:00:16,000
about how much space is needed by the 
basic algorithm we've been studying thus 

5
00:00:16,000 --> 00:00:24,151
far. 
All right, so the correct answer here is 

6
00:00:24,151 --> 00:00:29,957
the first one, A. 
The space required by the basic 

7
00:00:29,957 --> 00:00:33,595
Bellman-Ford algorithm is quadratic in 
the number of vertices. 

8
00:00:33,595 --> 00:00:35,297
It is theta of n^2. 
Why? 

9
00:00:35,297 --> 00:00:39,932
Well, the space is dominated by the 2D 
array that we fill in, and there is only 

10
00:00:39,932 --> 00:00:43,805
a constant amount of space per 
sub-problem, and there is n-squared 

11
00:00:43,805 --> 00:00:46,797
sub-problems. 
Number one index is the budget on how 

12
00:00:46,797 --> 00:00:51,902
many edges we are allowed to use, 'I' 
ranges from 0 to n - 1 and then the other 

13
00:00:51,902 --> 00:00:54,894
index is just a destination and there's n 
of those. 

14
00:00:54,894 --> 00:00:59,530
So in this video, we will talk about how 
you can do better, how you can get away 

15
00:00:59,530 --> 00:01:04,664
with linear space, not quadratic space. 
The initial observation is a simple one. 

16
00:01:04,664 --> 00:01:09,193
Let's just return to the formula that we 
used to populate all of the entries of 

17
00:01:09,193 --> 00:01:12,873
the array. 
So, we always take the, a bunch of a 

18
00:01:12,873 --> 00:01:15,806
whole bunch of candidates, what are the 
candidates? 

19
00:01:15,806 --> 00:01:20,910
Well either we can inherit the solution 
from last iteration ia, i - 1v or we can 

20
00:01:20,910 --> 00:01:26,073
choose something to be the last hop wv 
and then we just paste together the best 

21
00:01:26,073 --> 00:01:30,649
solution 2w from last iteration. 
So ai - 1w plus we pay for the last hop, 

22
00:01:30,649 --> 00:01:34,345
the length of the edge wv. 
But here's the point, staring at this 

23
00:01:34,345 --> 00:01:37,220
formula, which subproblem values do we 
care about? 

24
00:01:37,220 --> 00:01:41,603
Well not that many of them and in 
particular the trait shared by all of the 

25
00:01:41,603 --> 00:01:46,159
interesting candidates are they're all 
from the last iteration the first index 

26
00:01:46,159 --> 00:01:50,600
and everything on the right hand side is 
always i - 1. 

27
00:01:50,600 --> 00:01:56,092
As a consequence, as soon as we finished 
one batch of sub-problems for some value 

28
00:01:56,092 --> 00:02:01,314
of i, we've computed aiv for all v, we 
can throw out the results of all of the 

29
00:02:01,314 --> 00:02:05,247
previous rounds of sub-problems. 
Ai - 1, Ai - 2, and so on. 

30
00:02:05,247 --> 00:02:10,197
All we need is the freshest batch, the 
most recent iteration, to correctly 

31
00:02:10,197 --> 00:02:13,569
proceed. 
If we do that, the only sub problems 

32
00:02:13,569 --> 00:02:18,181
we're keeping track of are the ones that 
we're filling in, in the current round i 

33
00:02:18,181 --> 00:02:23,134
and we're remember the solutions from the 
previous round i - 1, that means at any 

34
00:02:23,134 --> 00:02:27,347
given moment in time we're only keeping 
track of big o of n different sub 

35
00:02:27,347 --> 00:02:30,986
problems. 
And this linear space bound is even more 

36
00:02:30,986 --> 00:02:35,622
impressive than it might appear at first, 
and that's because if you think about the 

37
00:02:35,622 --> 00:02:40,024
responsibility of our algorithm in this 
problem, we have to output a linear 

38
00:02:40,024 --> 00:02:43,545
number of things, right? 
We're not just outputting one number. 

39
00:02:43,545 --> 00:02:47,947
We're outputting n numbers, shortest path 
distance from s to every possible 

40
00:02:47,947 --> 00:02:50,764
destination. 
So the space is really constant per 

41
00:02:50,764 --> 00:02:55,007
statistic that we have to compute. 
One thing I'm not going to discuss 

42
00:02:55,007 --> 00:02:57,850
explicitly but I think it would be a good 
exercise for you. 

43
00:02:57,850 --> 00:03:01,514
Go back to all of the other dynamic 
programming algorithms we discussed, and 

44
00:03:01,514 --> 00:03:05,129
of course by now there are many, and see 
which ones you can do a comparable 

45
00:03:05,129 --> 00:03:07,635
optimization. 
Which sub-problems do you need to keep 

46
00:03:07,635 --> 00:03:11,009
around to compute the new ones? 
And if you throw out all the ones that 

47
00:03:11,009 --> 00:03:14,287
you're not going to need anymore, what 
does that reduce the space to? 

48
00:03:14,287 --> 00:03:17,661
For many of the problems we've just 
studied, you're going to see a big 

49
00:03:17,661 --> 00:03:21,916
improvement in the space required. 
But this would be a good time to pause 

50
00:03:21,916 --> 00:03:25,975
and stop for a few seconds and think 
about possible negative consequences of 

51
00:03:25,975 --> 00:03:29,139
this space optimization. 
So suppose we do this, either in the 

52
00:03:29,139 --> 00:03:32,988
context of shortest path going forward, 
or some other dynamic programming 

53
00:03:32,988 --> 00:03:35,149
algorithm. 
We throw out the values of old 

54
00:03:35,149 --> 00:03:38,945
sub-problem solutions that we seemingly 
don't need any more to push the 

55
00:03:38,945 --> 00:03:42,372
recurrence further. 
Are there any drawbacks to this space 

56
00:03:42,372 --> 00:03:50,215
optimization? 
[SOUND] So the answer depends. 

57
00:03:50,215 --> 00:03:54,239
[SOUND] If all you care about is the 
value of an optimal solution and not the 

58
00:03:54,239 --> 00:03:58,212
optimal solution itself, then you may as 
well just throw out all of these old 

59
00:03:58,212 --> 00:04:01,875
sub-problems and you don't need anymore 
to push the recurrence further. 

60
00:04:01,875 --> 00:04:05,125
It's not going to hurt you at all. 
If, however, you want the optimal 

61
00:04:05,125 --> 00:04:09,252
solution itself, and not just its value. 
Well, how did we actually accomplish that 

62
00:04:09,252 --> 00:04:11,884
in the past? 
We had these reconstruction algorithms. 

63
00:04:11,884 --> 00:04:14,051
How do the reconstruction algorithms 
work? 

64
00:04:14,051 --> 00:04:17,714
You took the entire filled in table and 
you traced backward through it. 

65
00:04:17,714 --> 00:04:21,790
At each entry of the filled in table you 
looked at which of the candidates won, 

66
00:04:21,790 --> 00:04:26,151
which comparison came out the winner when 
you filled in this entry, that told you a 

67
00:04:26,151 --> 00:04:29,735
piece of what the optimal solution has to 
look like and then you go backward 

68
00:04:29,735 --> 00:04:33,551
through the table and repeat the process. 
If you've thrown out most of your filled 

69
00:04:33,551 --> 00:04:35,082
in table. 
How are you going to run this 

70
00:04:35,082 --> 00:04:37,819
reconstruction algorithm? 
In the context of the Bellman-Ford 

71
00:04:37,819 --> 00:04:41,593
algorithm, how are you going to 
reconstruct shortest paths if you haven't 

72
00:04:41,593 --> 00:04:45,160
remembered all of the sub-problems of all 
of the previous iterations. 

73
00:04:46,180 --> 00:04:50,324
And you can then certainly imagine in a 
routing application for example, you 

74
00:04:50,324 --> 00:04:54,578
don't just want to know that the shortest 
path has length seventeen, you want to 

75
00:04:54,578 --> 00:04:57,960
know, what route should we take to get 
from point A to point B? 

76
00:04:59,020 --> 00:05:03,346
So in the rest of this video, I'm going 
to describe a solution for the 

77
00:05:03,346 --> 00:05:07,240
Bellman-Ford algorithm. 
I'm going to show you how we can retain 

78
00:05:07,240 --> 00:05:11,691
the constant space per vertex guarantee 
while recovering the ability to 

79
00:05:11,691 --> 00:05:16,062
reconstruct shortest paths. 
So, the idea is, for a given value of i 

80
00:05:16,062 --> 00:05:21,047
and a given destination v, we're going to 
keep track of, not just the one piece of 

81
00:05:21,047 --> 00:05:25,510
information, the length of the shortest 
path from s to v that uses the most I 

82
00:05:25,510 --> 00:05:29,335
edges, but also a second piece of 
information, [SOUND] which is the 

83
00:05:29,335 --> 00:05:33,624
second-to-last vertex on such a path. 
So it's still going to be just constant 

84
00:05:33,624 --> 00:05:37,580
space per sub-problem. 
So we're going to call this second 

85
00:05:37,580 --> 00:05:40,528
two-dimensional array b. 
We're going to call its entries 

86
00:05:40,528 --> 00:05:43,754
predecessor pointers. 
Again, the semantics are this pointer 

87
00:05:43,754 --> 00:05:47,870
points to the predecessor of this 
destination v on a shortest path from s 

88
00:05:47,870 --> 00:05:51,653
to v that has the most i edges. 
Of course, if i is sufficiently small 

89
00:05:51,653 --> 00:05:56,047
there may be no such paths from s to v, 
and in that case we just have a null 

90
00:05:56,047 --> 00:06:00,813
predecessor pointer. 
So for a moment let's just forget about 

91
00:06:00,813 --> 00:06:05,637
the space optimization that we're trying 
to attain and let's just observe that if 

92
00:06:05,637 --> 00:06:10,049
we correctly computed these capital B's 
then simply traversing predecessor 

93
00:06:10,049 --> 00:06:12,520
pointers would reconstruct shortest 
paths. 

94
00:06:14,200 --> 00:06:15,942
So, why is this true? 
Well, two reasons. 

95
00:06:15,942 --> 00:06:18,486
Reasons that you've seen explain other 
things as well. 

96
00:06:18,486 --> 00:06:22,395
So, the first reason is remember that 
we're assuming that the input graph has 

97
00:06:22,395 --> 00:06:26,410
no, no negative cost cycle. 
Therefore shortest paths have a most n - 

98
00:06:26,410 --> 00:06:29,758
one edges. 
Therefore the bn - 1v is actually store 

99
00:06:29,758 --> 00:06:35,339
in them the final hop of a shortest path 
period, with no edge budget from s to v. 

100
00:06:35,339 --> 00:06:40,501
So in the bn - 1v is the final batch of 
predators or points, if we correctly 

101
00:06:40,501 --> 00:06:45,038
compute them, they are telling us the 
last hop on a shortest path to v. 

102
00:06:45,038 --> 00:06:49,136
Now, the other part of the correctness 
comes from the optimal substructure 

103
00:06:49,136 --> 00:06:51,517
limbo. 
So, remember way back when we started 

104
00:06:51,517 --> 00:06:55,892
talking about shortest paths and their 
optimal substructure, we said, well if 

105
00:06:55,892 --> 00:07:00,433
only we had a little birdy which told us 
what the last hop of a shortest path was, 

106
00:07:00,433 --> 00:07:04,863
then the shortest path would just be that 
last hop concatinated with a shortest 

107
00:07:04,863 --> 00:07:09,182
path from s up to the penultimate vertex 
w and, these predecessor pointers are 

108
00:07:09,182 --> 00:07:12,837
exactly this little birdy. 
We're storing at v what the last hop is, 

109
00:07:12,837 --> 00:07:17,322
w, v and so, we know that it's just that 
last hop with the shortest path back to S 

110
00:07:17,322 --> 00:07:19,986
from W. 
And by traversing, that's exactly what 

111
00:07:19,986 --> 00:07:24,000
you do, you just reconstruct the rest of 
the path apart from s to w. 

112
00:07:29,580 --> 00:07:33,754
So, if we correctly compute predecessor 
pointers then they enable the simple 

113
00:07:33,754 --> 00:07:37,324
reconstruction of shortest path, just by 
traversal after the fact. 

114
00:07:37,324 --> 00:07:41,334
So, that leaves of with the task of 
correctly computing these predecessor 

115
00:07:41,334 --> 00:07:42,597
pointers. 
It's not hard. 

116
00:07:42,597 --> 00:07:46,771
I think many of you could just fill in 
the details yourself, but let me just 

117
00:07:46,771 --> 00:07:50,654
give quick rundown. 
So remember in general we have this 2-D 

118
00:07:50,654 --> 00:07:54,750
array capital B, it's indexed by the edge 
budget I and it's indexed by the 

119
00:07:54,750 --> 00:07:59,123
destination v and we're supposed to fill 
in biv with a final hop on a shortest 

120
00:07:59,123 --> 00:08:03,441
path from s to v that uses at most I 
edges and if no such paths exist then 

121
00:08:03,441 --> 00:08:06,430
it's just null. 
For the base case that's when I equals 

122
00:08:06,430 --> 00:08:09,420
zero and here everybody's predecessor 
pointer is null. 

123
00:08:10,860 --> 00:08:16,201
So the way we're going to fill in the 
entry biv is going to depend on which of 

124
00:08:16,201 --> 00:08:20,761
the candidates won in the competition to 
be the shortest path for aiv. 

125
00:08:20,761 --> 00:08:25,647
In essence this predecessor pointer biv 
is just caching the results of the 

126
00:08:25,647 --> 00:08:29,100
competition we ran to find the shortest 
path for aiv. 

127
00:08:29,100 --> 00:08:33,660
So to make this precise, let's just 
recall the formula that we used to 

128
00:08:33,660 --> 00:08:38,220
compute for aiv, given the solutions to 
the last batch of sub-problems. 

129
00:08:39,260 --> 00:08:44,660
Case one is when you inherit the solution 
from the last round ai minus 1v. 

130
00:08:44,660 --> 00:08:49,792
In addition each possible choice for a 
last hop w, v furnishes another candidate 

131
00:08:49,792 --> 00:08:54,207
for the optimal solution of this round in 
which your going to pay, you're going to 

132
00:08:54,207 --> 00:08:58,862
pay the optimal solution to the sub 
problem i minus 1w plus the length of the 

133
00:08:58,862 --> 00:09:02,409
edge wv. 
So we're going to fill in the biv array. 

134
00:09:02,409 --> 00:09:07,316
Basically, just to reflect what happens 
when we computed the solution to AIV. 

135
00:09:07,316 --> 00:09:13,257
In essence, what we're doing in the 2D, 
2D array b is remembering the most recent 

136
00:09:13,257 --> 00:09:17,970
vertex w that's responsible for 
furnishing a new shortest path from s to 

137
00:09:17,970 --> 00:09:20,891
v. 
So in case one the boring case where at 

138
00:09:20,891 --> 00:09:25,953
iteration I we just inherit the solution 
from the last round of course in the b 

139
00:09:25,953 --> 00:09:29,980
entry we just inherit the last hop of the 
previous round. 

140
00:09:29,980 --> 00:09:34,876
That is if we use case one to fill in aiv 
then we just sent biv to be, bi - 1b. 

141
00:09:34,876 --> 00:09:39,965
So the interesting case is when aiv is 
filled in using case two of the formula, 

142
00:09:39,965 --> 00:09:45,119
that is when this shortest path from s to 
v suddenly improves, given a budget of 

143
00:09:45,119 --> 00:09:49,821
I-hops, as opposed to just, I-1hops. 
In that case we're just going to cash the 

144
00:09:49,821 --> 00:09:54,331
results of the brute force search that we 
did to evaluate the formula. 

145
00:09:54,331 --> 00:09:59,485
That is we justt remember the choice of 
the pan ultimate vertex W, that achieved 

146
00:09:59,485 --> 00:10:07,518
the minimum in this round. 
Now notice that, just like the formula 

147
00:10:07,518 --> 00:10:10,452
that we used to populate the capital a 
array. 

148
00:10:10,452 --> 00:10:14,104
The original array. 
To compute these BIVs, all we need to 

149
00:10:14,104 --> 00:10:18,734
know is information from the last round. 
Information from the round i - 1. 

150
00:10:18,734 --> 00:10:23,886
So just like with the a array, we can 
throw out all of the predecessor pointers 

151
00:10:23,886 --> 00:10:27,669
that date back before yesterday. 
Before the previous round. 

152
00:10:27,669 --> 00:10:31,712
Therefore, we again, need only constant 
space per destination v. 

153
00:10:31,712 --> 00:10:36,277
To maintain both the shortest path 
distance from s to v with the most 

154
00:10:36,277 --> 00:10:38,560
i-hops. 
And the predecessor pointer. 

155
00:10:40,020 --> 00:10:42,914
So this is great. 
For input graphs that don't have negative 

156
00:10:42,914 --> 00:10:46,790
cost cycles, we can not nearly compute 
the shortest paths in constant space per 

157
00:10:46,790 --> 00:10:49,292
destination. 
We can also compute the predecessor 

158
00:10:49,292 --> 00:10:53,119
pointers in that same space which allows 
the reconstruction of shortest paths. 

159
00:10:53,119 --> 00:10:57,093
So one other thing you might want is to 
handle graphs that do have negative cost 

160
00:10:57,093 --> 00:10:59,350
cycles. 
Now in a separate video we showed that 

161
00:10:59,350 --> 00:11:03,275
just checking whether an input graph has 
a negative cost cycle or not is easily 

162
00:11:03,275 --> 00:11:06,317
addressed by a simple extension of the 
Bellman-Ford Algorithm. 

163
00:11:06,317 --> 00:11:08,966
You tack on one extra iteration of the 
outer for loop. 

164
00:11:08,966 --> 00:11:12,253
You let I range not just up to n - 1 but 
all the way up to n. 

165
00:11:12,253 --> 00:11:18,700
And if you see for some destination v, an 
improvement in the extra iteration when i 

166
00:11:18,700 --> 00:11:23,460
= n, then that guarantees that there is a 
negative cost cycle, 

167
00:11:23,460 --> 00:11:25,500
that is if and only if. 
But, in the same way you might want to 

168
00:11:25,500 --> 00:11:30,000
actually have the shortest paths not 
really their value, you might want to 

169
00:11:30,000 --> 00:11:34,320
actually have a negative cost cycle and 
not really know that one exists. 

170
00:11:35,560 --> 00:11:39,409
It turns out you can solve the 
reconstruction problem for negative cost 

171
00:11:39,409 --> 00:11:43,472
cycles as well, using the Bellman-Ford 
algorithm and predecessor pointers in 

172
00:11:43,472 --> 00:11:46,091
exactly the way we've been maintaining 
them here. 

173
00:11:46,091 --> 00:11:49,406
I'm not going to talk through all of the 
details of the solution. 

174
00:11:49,406 --> 00:11:53,469
I'll leave it to you as a somewhat 
nontrivial exercise to think through how 

175
00:11:53,469 --> 00:11:57,372
this really works, but the gist is as 
follows, so you run the Bellman-Ford 

176
00:11:57,372 --> 00:12:01,435
algorithm, you maintain the predecessor 
pointers as on this slide, and if the 

177
00:12:01,435 --> 00:12:05,498
input graph does have a negative cost 
cycle, then at some iteration, you will 

178
00:12:05,498 --> 00:12:08,220
see a cycle amongst the predecessor 
pointers. 

179
00:12:08,220 --> 00:12:13,265
Furthermore, that cycle of predecessor 
pointers must be a negative cost cycle of 

180
00:12:13,265 --> 00:12:17,489
the original input graph. 
This means that detecting a negative cost 

181
00:12:17,489 --> 00:12:22,534
cycle if one exists reduces to checking 
for a cycle in the predecessor pointers 

182
00:12:22,534 --> 00:12:27,200
which of course you can solve using depth 
for a search at every iteration.