1
00:00:00,000 --> 00:00:03,479
In this video, and then in the next. 
We're going to develop, from scratch, a 

2
00:00:03,479 --> 00:00:05,915
algorithm for the all pair shortest path 
problem. 

3
00:00:05,915 --> 00:00:09,345
Rather than by relying on the 
aforementioned reduction to the single 

4
00:00:09,345 --> 00:00:12,676
source version of the problem. 
So, in this video, we're going to develop 

5
00:00:12,676 --> 00:00:16,702
the relevant optimal substructure lemma. 
In the next video, we'll culminate in the 

6
00:00:16,702 --> 00:00:19,138
dynamic programming algorithm. 
It is a famous one. 

7
00:00:19,138 --> 00:00:23,001
The Floyd Warshall algorithm. 
Before developing the optimal 

8
00:00:23,001 --> 00:00:27,200
substructure let me just tell you a 
little bit about where we're going. 

9
00:00:27,200 --> 00:00:31,135
The optimal sub-structure we're about to 
identify will lead to a dynamic 

10
00:00:31,135 --> 00:00:35,017
programming solution via the usual 
recipe, which we've now seen over, and 

11
00:00:35,017 --> 00:00:37,767
over, and over again. 
It's going to solve the all-pairs 

12
00:00:37,767 --> 00:00:40,139
shortest paths problem in O of N cubed 
time. 

13
00:00:40,139 --> 00:00:42,619
So we get a bound independent of the 
sparsity. 

14
00:00:42,619 --> 00:00:45,854
[SOUND] N cubed, even if the graph has 
negative edge lengths. 

15
00:00:45,854 --> 00:00:48,820
This algorithm is know as the 
Floyd-Warshall algorithm. 

16
00:00:50,040 --> 00:00:54,805
Last video we discussed the performance 
that you get by running a single source, 

17
00:00:54,805 --> 00:00:58,915
shortest path sub routine N times. 
And the Floyd-Warshall algorithm is a 

18
00:00:58,915 --> 00:01:03,242
better solution in many situations. 
So first of all, for graphs that have 

19
00:01:03,242 --> 00:01:06,116
general edge links. 
That is edge links that are allowed to be 

20
00:01:06,116 --> 00:01:08,424
negative then Floyd-Warshall is looking 
pretty good. 

21
00:01:08,424 --> 00:01:12,099
Remember with the previous solution was 
to run the Bellman-Ford algorithm end 

22
00:01:12,099 --> 00:01:14,265
times. 
We can't use Dijkstra's algorithm cause 

23
00:01:14,265 --> 00:01:16,809
that's not generally correct with 
negative edge costs. 

24
00:01:16,809 --> 00:01:20,531
And if you run the Bellman-Ford algorithm 
in times you get a running time of N 

25
00:01:20,531 --> 00:01:22,321
squared M. 
So even in the best case of 

26
00:01:22,321 --> 00:01:25,665
Bellman-Ford's sparse graphs, we're doing 
just as well with Ford Warshall. 

27
00:01:25,665 --> 00:01:28,963
We've got a cubic running time. 
And we're doing quite a bit better for 

28
00:01:28,963 --> 00:01:31,554
dense graphs. 
There N times Bellman-Ford will give us N 

29
00:01:31,554 --> 00:01:34,380
to the fourth. 
A cubic running time whereas here we're 

30
00:01:34,380 --> 00:01:37,808
getting cubic. 
Now for graphs with non negative edge 

31
00:01:37,808 --> 00:01:41,213
costs, reducing to the single source 
problem is actually a pretty good 

32
00:01:41,213 --> 00:01:44,764
solution because Dijkstra is so fast. 
So if you're on Dijkstra N times, you're 

33
00:01:44,764 --> 00:01:47,196
running time is going to be N times M 
times log N. 

34
00:01:47,196 --> 00:01:50,892
So for sparse graphs you'd really want to 
use Dijkstra N times, rather than ford 

35
00:01:50,892 --> 00:01:53,081
warshaw. 
Cause you get a running time roughly 

36
00:01:53,081 --> 00:01:56,778
quadratic of N, as opposed to the cubic 
running time we're going to get here. 

37
00:01:56,778 --> 00:02:00,523
For dense graphs, it's not so clear. 
If you run dyxtra N times you're going to 

38
00:02:00,523 --> 00:02:04,317
get a running time which is ball park 
cubic, so that's going to be roughly the 

39
00:02:04,317 --> 00:02:07,236
same as Floyd-Warshall. 
And you'd expect them to be comparable. 

40
00:02:07,236 --> 00:02:09,619
It's not clear which would be better in 
practice. 

41
00:02:09,619 --> 00:02:11,898
If you. 
Had to choose between those two solutions 

42
00:02:11,898 --> 00:02:15,544
for dense graphs, you'd really want to 
code them both up and pick whichever one 

43
00:02:15,544 --> 00:02:20,977
seemed to do better in your domain. 
One place you see the Floyd-Warshall 

44
00:02:20,977 --> 00:02:25,021
algorithm used quite a bit is for 
computing the transitive closure of a 

45
00:02:25,021 --> 00:02:28,404
binary relation. 
In a graph theoretic language you can 

46
00:02:28,404 --> 00:02:31,957
think of a transitive closure problem as 
computing all pairs reachability. 

47
00:02:31,957 --> 00:02:35,462
So that's a special case of all pair 
shortest paths where you just want to 

48
00:02:35,462 --> 00:02:38,438
know whether the shortest path distance 
is finite or infinite. 

49
00:02:38,438 --> 00:02:42,183
For each pair of vertices you want to 
know does there exist a path from the one 

50
00:02:42,183 --> 00:02:45,804
vertex to the other or not. 
And if all you care about is the 

51
00:02:45,804 --> 00:02:49,357
transitive closure problem, if all you 
care about is one bit of information for 

52
00:02:49,357 --> 00:02:52,685
each pair of nodes, connected or not, as 
opposed to more generally what the 

53
00:02:52,685 --> 00:02:56,238
shortest path length is, then you can do 
some optimizations on the Floyd-Warshal 

54
00:02:56,238 --> 00:02:58,172
algorithm to speed up the constant 
factors. 

55
00:02:58,172 --> 00:03:01,860
But I'll leave it up to you to read up on 
the web about that if you're interested. 

56
00:03:02,960 --> 00:03:06,332
Now, I realize that this cubic running 
time is, [SOUND] maybe, not super 

57
00:03:06,332 --> 00:03:08,740
impressive. 
In general, as we've been talking about 

58
00:03:08,740 --> 00:03:12,450
dynamic programming algorithms, we've 
been starting to see, for the first time 

59
00:03:12,450 --> 00:03:14,714
in these courses, a proliferation of 
algorithms. 

60
00:03:14,714 --> 00:03:18,375
Which, you know, while polynomial time, 
while way better than exponential time, 

61
00:03:18,375 --> 00:03:22,133
you wouldn't really call blazingly fast. 
We've seen a lot of quadratic running 

62
00:03:22,133 --> 00:03:24,156
times. 
Now we're seeing some cubic running 

63
00:03:24,156 --> 00:03:26,180
times. 
So, that's kind of a bummer, but, I am 

64
00:03:26,180 --> 00:03:29,986
still teaching you the state-of-the-art. 
As we mentioned this in the last video, 

65
00:03:29,986 --> 00:03:33,454
it remains an open question to this day, 
whether there are significantly. 

66
00:03:33,454 --> 00:03:38,620
[SOUND] Of cubic algorithms that solve 
the all pairs shortest paths problem. 

67
00:03:38,620 --> 00:03:42,488
So now, let's formalize the optimal 
substructure in all pair shortest path 

68
00:03:42,488 --> 00:03:45,207
which is exploited by the Floyd-Warshall 
algorithm. 

69
00:03:45,207 --> 00:03:49,076
A quick digression first though. 
At the beginning of the class I hope that 

70
00:03:49,076 --> 00:03:52,893
you know, if I had shown you this 
Floyd-Warshall algorithm, you would have 

71
00:03:52,893 --> 00:03:55,141
said wow, that's a really elegant 
algorithm. 

72
00:03:55,141 --> 00:03:59,376
How could IU have ever come up with this? 
Whereas now that you're approaching the 

73
00:03:59,376 --> 00:04:03,454
black belt level in dynamic programming 
and you see just how mechanically the 

74
00:04:03,454 --> 00:04:07,114
algorithm, this famous algorithm is 
going to fall out of the really quite 

75
00:04:07,114 --> 00:04:10,617
easy optimal substructure. 
I hope you say, how could I not have come 

76
00:04:10,617 --> 00:04:14,834
up with this algorithm? 
Now I don't mean to take any credit away 

77
00:04:14,834 --> 00:04:19,264
from the solutions of Floyd and Warshall. 
There is one really great idea in how 

78
00:04:19,264 --> 00:04:23,532
they phrase this optimal substructure. 
Remember we talked about this before the 

79
00:04:23,532 --> 00:04:27,746
Bellman-Ford algorithm, it can be tricky 
to apply dynamic programming to graph 

80
00:04:27,746 --> 00:04:31,041
problems because there isn't really an 
ordering in the input. 

81
00:04:31,041 --> 00:04:34,066
You just get this bag of vertices and 
this bag of edges. 

82
00:04:34,066 --> 00:04:37,740
And it's often unclear how to define 
bigger and smaller subproblems. 

83
00:04:39,040 --> 00:04:43,095
So the really nice solution in the 
Bellman-Ford algorithm was to introduce 

84
00:04:43,095 --> 00:04:46,556
this extra parameter I. 
I was a budget on the number of edges or 

85
00:04:46,556 --> 00:04:50,719
roughly equivalently the number of 
vertices that you are allowed to use in a 

86
00:04:50,719 --> 00:04:53,132
path. 
Between the origin and destination for a 

87
00:04:53,132 --> 00:04:56,005
given subproblem. 
This naturally induces an ordering on 

88
00:04:56,005 --> 00:04:59,401
subproblems, the bigger the edge budget, 
the bigger the subproblem. 

89
00:04:59,401 --> 00:05:03,424
We're going to do something similar but 
even a little bit more stringent in the 

90
00:05:03,424 --> 00:05:06,820
Floyd-Warshall solution. 
We're again going to restrict the number 

91
00:05:06,820 --> 00:05:11,051
of vertices that can appear in the middle 
of the path between a given origin and 

92
00:05:11,051 --> 00:05:14,500
destination and a subproblem. 
But more than just the number of the 

93
00:05:14,500 --> 00:05:18,366
vertices we're allowed to use, we're 
going to restrict exactly which vertices, 

94
00:05:18,366 --> 00:05:22,545
the identities of the vertices that the 
algorithms permitted to use to get from 

95
00:05:22,545 --> 00:05:24,740
the given origin to the given 
destination. 

96
00:05:26,160 --> 00:05:30,317
in many ways, the restriction we're going 
to impose is totally analogist to, say, 

97
00:05:30,317 --> 00:05:34,474
the knapsack problem where there wasn't 
an intrinsic ordering on the items but we 

98
00:05:34,474 --> 00:05:37,870
imposed one anyways and then we just look 
at prefixes of the items. 

99
00:05:37,870 --> 00:05:41,064
And bigger sub-problems are ones with 
bigger prefixes of items. 

100
00:05:41,064 --> 00:05:44,359
Here we're going to impose an arbitrary 
ordering on the vertices. 

101
00:05:44,359 --> 00:05:48,060
And in a given sub-problem, you're only 
allowed to use some prefix of the 

102
00:05:48,060 --> 00:05:51,659
vertices as internal nodes in a path. 
And then, of course, the bigger the 

103
00:05:51,659 --> 00:05:54,600
prefix you're permitted to use, the 
bigger the sub-problem. 

104
00:05:55,900 --> 00:05:59,762
So to be a little bit more formal about 
it, let's just impose some arbitrary 

105
00:05:59,762 --> 00:06:03,777
ordering on the vertices capital V, name 
the vertices one, two, three, all the way 

106
00:06:03,777 --> 00:06:05,048
up to N. 
I don't care how. 

107
00:06:05,048 --> 00:06:09,063
And I'm going to use the notation capital 
V superscript K to denote the prefix of 

108
00:06:09,063 --> 00:06:11,300
the first K vertices, vertices one 
through K. 

109
00:06:12,420 --> 00:06:17,180
So let me start now setting up the 
optimal substructure limit. 

110
00:06:17,180 --> 00:06:20,465
In contrast to the Bellman-Ford case, I'm 
only going to prove the optimal 

111
00:06:20,465 --> 00:06:23,657
substructure limit of four input graphs 
that have no negative cycle. 

112
00:06:23,657 --> 00:06:27,318
We'll address the case of negative cycles 
after we're done with the algorithm. 

113
00:06:27,318 --> 00:06:31,965
Suppose for now there isn't one. 
So what then are the subproblems going to 

114
00:06:31,965 --> 00:06:34,002
be? 
Well it's going to be quite analogous to 

115
00:06:34,002 --> 00:06:36,503
Bellman-Ford. 
In Bellman-Ford we were solving a single 

116
00:06:36,503 --> 00:06:39,976
source shortest path problem so we needed 
to compute something for each 

117
00:06:39,976 --> 00:06:42,153
destination. 
So that gave us a linear number of 

118
00:06:42,153 --> 00:06:45,348
sub-problems and then for a given 
destination we had this parameter I 

119
00:06:45,348 --> 00:06:48,497
controlling the edge budget. 
So that was another linear factor, so we 

120
00:06:48,497 --> 00:06:52,155
had a quadratic number of subproblems. 
Here it's going to be the same except we 

121
00:06:52,155 --> 00:06:55,814
also have to range over our own origin. 
So we're going to get a cubic number of 

122
00:06:55,814 --> 00:06:59,333
subproblems, specifically a choice for 
the origin, N choices, a choice for the 

123
00:06:59,333 --> 00:07:03,774
destination, that's N more choices plus. 
Which prefix of vertices we're allowed to 

124
00:07:03,774 --> 00:07:06,158
use. 
So that's still another N choices. 

125
00:07:06,158 --> 00:07:10,409
So cubic total. 
So now we focus on an optimal solution to 

126
00:07:10,409 --> 00:07:14,384
this sub problem. 
By which I mean amongst all paths which 

127
00:07:14,384 --> 00:07:19,754
begin at the vertex I, end at the vertex 
J, and strictly in between I and J, as 

128
00:07:19,754 --> 00:07:23,659
internal nodes contain only vertices from 
one through K. 

129
00:07:23,659 --> 00:07:28,192
Amongst all those paths look at the one 
with the shortest length. 

130
00:07:28,192 --> 00:07:33,702
And because we're looking at the case of 
no negative cycles, we can assume that 

131
00:07:33,702 --> 00:07:37,900
this path has no cycles. 
It's a cycle free path. 

132
00:07:37,900 --> 00:07:41,713
So rather than state the conclusion of 
the optimal substructure lemma right now, 

133
00:07:41,713 --> 00:07:45,878
let me just make sure you understand what 
one of these sub-problems is. 

134
00:07:45,878 --> 00:07:50,160
So, lets suppose that the origin is the 
vertex that we've, aren't labelled 

135
00:07:50,160 --> 00:07:54,677
arbitrarily seventeen, the destination J 
is the vertex we've labelled ten, and 

136
00:07:54,677 --> 00:07:57,980
let's suppose K at the moment is only 
five. 

137
00:07:57,980 --> 00:08:02,186
Consider the following graph. 
Imagine this is a little piece of some 

138
00:08:02,186 --> 00:08:05,868
much bigger graph. 
Now I hope it's clear what the shortest 

139
00:08:05,868 --> 00:08:09,863
path is between seventeen and ten. 
It's the bottom two hop path, that path 

140
00:08:09,863 --> 00:08:13,584
has total length minus twenty. 
On the other hand, for the sub problem 

141
00:08:13,584 --> 00:08:17,591
we're dealing with, K equals five. 
So what that means is we have this extra 

142
00:08:17,591 --> 00:08:21,565
constraint on which vertices can we can 
use in the middle of our paths. 

143
00:08:21,565 --> 00:08:25,699
Now to be clear, this constraint K at 
most five that can't apply to origin of 

144
00:08:25,699 --> 00:08:26,881
the destination. 
Right? 

145
00:08:26,881 --> 00:08:31,015
I is 17, J is 10, both of those are 
bigger than five, but we don't care. 

146
00:08:31,015 --> 00:08:34,882
Obviously, any path from I to J, has to 
include both the vertices I and J. 

147
00:08:34,882 --> 00:08:38,587
So the constraint only applies to 
vertices in the middle of the path. 

148
00:08:38,587 --> 00:08:42,077
But, this bottom two hot path, 
unfortunately makes use of the node 

149
00:08:42,077 --> 00:08:44,064
seven. 
So that path does not obey the 

150
00:08:44,064 --> 00:08:47,340
constraint, it is not at most five, so 
that path is disallowed. 

151
00:08:47,340 --> 00:08:50,327
Can not use it. 
Therefore, the shortest path from 

152
00:08:50,327 --> 00:08:55,446
seventeen to ten, that makes use of only 
the first five The five labeled vertices 

153
00:08:55,446 --> 00:09:00,275
as intermediate ones would be the top 
three hot path, which has the bigger 

154
00:09:00,275 --> 00:09:03,987
length of three. 
Oay, so I hope it's clear what a c- 

155
00:09:03,987 --> 00:09:07,493
sub-problem corresponds to. 
It corresponds to a choice of I, the 

156
00:09:07,493 --> 00:09:09,998
origin. 
That's again just some vertex labeled 

157
00:09:09,998 --> 00:09:13,225
something from one to N. 
Similarly there's a choice of the 

158
00:09:13,225 --> 00:09:17,288
destination, some other vertex J which is 
labeled something from one to N. 

159
00:09:17,288 --> 00:09:21,595
And then there's a choice of the bound K. 
Which governs which vertices you're 

160
00:09:21,595 --> 00:09:26,096
permitted to use internal to your path. 
So again the constraint doesn't apply to 

161
00:09:26,096 --> 00:09:29,416
the end points, just the vertices in 
between the end points. 

162
00:09:29,416 --> 00:09:33,524
And in the subproblem the choice of K 
says you can only use vertices one 

163
00:09:33,524 --> 00:09:37,856
through K, internal to your paths. 
So hopefully that makes sense, let's move 

164
00:09:37,856 --> 00:09:41,120
on to the full statement of the optimal 
substructure lema. 

165
00:09:42,600 --> 00:09:46,859
All right, so this is actually going to 
be one of the simpler optimal 

166
00:09:46,859 --> 00:09:50,280
substructural lemma that we've seen in a 
while. 

167
00:09:50,280 --> 00:09:53,313
We're back to the glory days of only two 
cases. 

168
00:09:53,313 --> 00:09:58,348
The first is trivial, if the shortest 
path from I to J using only vertices one 

169
00:09:58,348 --> 00:10:03,575
through K as intermediaries doesn't even 
bother to use the vertex K, well then it 

170
00:10:03,575 --> 00:10:08,868
better well be a shortest path from I to 
J that uses internal nodes only from one 

171
00:10:08,868 --> 00:10:12,192
to K-1. 
Now it's in case two it's going to be 

172
00:10:12,192 --> 00:10:16,594
apparent what a great idea ordering the 
vertices looking at prefixes. 

173
00:10:16,594 --> 00:10:21,570
The vertices is when you're considering 
the all pairs version of the shortest 

174
00:10:21,570 --> 00:10:25,765
path problem. 
Suppose this path P from I to J does 

175
00:10:25,765 --> 00:10:31,301
indeed use the vertex K in the middle. 
Well than we can think of the path P as 

176
00:10:31,301 --> 00:10:35,914
comprising two sub paths. 
The first path P1 which starts at I and 

177
00:10:35,914 --> 00:10:40,314
goes to the vertex K. 
And then a path P2 which starts at K and 

178
00:10:40,314 --> 00:10:44,910
goes to the destination J. 
Now here's what's cool. 

179
00:10:44,910 --> 00:10:50,438
So on this path P the internal nodes 
between I and J all lie in one through K. 

180
00:10:50,438 --> 00:10:55,682
Moreover, this path P remembers cycle 
free so the vertex K appears exactly 

181
00:10:55,682 --> 00:11:00,289
once, not more than once. 
So therefore if we split the path P into 

182
00:11:00,289 --> 00:11:05,180
these two parts, P1 and P2, internal to 
P1, strictly in between I and K. 

183
00:11:05,180 --> 00:11:07,756
There's only vertices between one and 
K-1. 

184
00:11:07,756 --> 00:11:12,420
Similarly strictly between K and J and 
P2, there's only vertices between one 

185
00:11:12,420 --> 00:11:15,549
through K-1. 
Thus both of these paths, P1 and P2 can 

186
00:11:15,549 --> 00:11:19,169
be thought of as feasible solutions to 
smaller subproblems. 

187
00:11:19,169 --> 00:11:23,709
Subproblems with an even tighter budget 
of K minus 1 on the internal nodes that 

188
00:11:23,709 --> 00:11:27,697
are allowed and these aren't just 
feasible solutions for smaller 

189
00:11:27,697 --> 00:11:31,520
subproblems, they're optimal solutions as 
well. 

190
00:11:31,520 --> 00:11:35,535
So, how slick is that? 
That the maximum indexed, internal node 

191
00:11:35,535 --> 00:11:40,259
just splits this shortest path into two 
shortest paths for smaller sub-problems? 

192
00:11:40,259 --> 00:11:43,684
And, you know what? 
At this point, I am so confident in your 

193
00:11:43,684 --> 00:11:48,113
facility with dynamic programming, I am 
not going to even bother to prove this 

194
00:11:48,113 --> 00:11:51,066
statement. 
You are totally capable of proving this 

195
00:11:51,066 --> 00:11:53,900
yourself. 
The argument, it's really quite similar 

196
00:11:53,900 --> 00:11:57,207
to Bellman-Ford. 
I encourage you to do it as an exercise. 

197
00:11:57,207 --> 00:11:57,680
[SOUND]