1
00:00:00,012 --> 00:00:03,764
Having laid the groundwork in the 
previous video, we're now ready to follow 

2
00:00:03,764 --> 00:00:07,761
our usual dynamic programming recipe to 
develop a faster than brute-force search 

3
00:00:07,761 --> 00:00:11,277
dynamic programming algorithm for the 
traveling Salesman problem. 

4
00:00:11,277 --> 00:00:15,257
Editing through all those quizzes in the 
last video, we developed some healthy 

5
00:00:15,257 --> 00:00:17,522
respect for the traveling salesman 
problem. 

6
00:00:17,522 --> 00:00:21,422
We understood why the more natural 
approaches, which were sufficient for the 

7
00:00:21,422 --> 00:00:24,202
path problem. 
For example, the Bellman Ford Algorithm 

8
00:00:24,202 --> 00:00:27,817
are not good enough to solve TSP. 
Now no surprise, TSP is an NP complete 

9
00:00:27,817 --> 00:00:31,407
problem, so you're sort of ready for 
this, but we know understand it in a 

10
00:00:31,407 --> 00:00:32,779
pretty. 
Concrete way. 

11
00:00:32,779 --> 00:00:36,392
In the single source, shortest path 
problem, in a sub-problem, all you need 

12
00:00:36,392 --> 00:00:39,470
to know is where's the path ending and 
how long could the path be. 

13
00:00:39,470 --> 00:00:42,958
That's not enough for the traveling 
salesman problem because there's this 

14
00:00:42,958 --> 00:00:46,279
extra tricky constraint. 
But at the end of the day, we want to 

15
00:00:46,279 --> 00:00:50,400
visit every vertex exactly once. 
So, in particular, to make sure we don't 

16
00:00:50,400 --> 00:00:53,700
have repeat vertices show up in our 
sub-problem solutions. 

17
00:00:53,700 --> 00:00:57,599
We're going to need to remember not just 
where smaller sub-problems end. 

18
00:00:57,599 --> 00:01:01,609
But also the identities of all of the 
intermediate vertices in the path. 

19
00:01:01,609 --> 00:01:06,679
Here, then, is the final definition of 
the sub-problems that we're going to use. 

20
00:01:06,679 --> 00:01:11,646
We still want to keep track of where this 
path ends so we still have our 

21
00:01:11,646 --> 00:01:17,244
destination j, and rather than just 
specifying a number of vertices that you 

22
00:01:17,244 --> 00:01:22,161
visit on this path, we specify exactly 
which vertices get visited. 

23
00:01:22,161 --> 00:01:26,301
So that's going to be a subset S. 
A subset of 1 through n. 

24
00:01:26,301 --> 00:01:32,366
And, of course, it better contain the 
initial vertex 1, and the final vertex j. 

25
00:01:32,366 --> 00:01:37,485
And so, for a given choice of j, and a 
given set choice of capital s. 

26
00:01:37,485 --> 00:01:41,583
We define l sub s, j, as the length of a 
shortest path. 

27
00:01:41,583 --> 00:01:47,007
It starts at vertex 1, end at vertex j. 
Visits precisely the vertices in capital 

28
00:01:47,007 --> 00:01:50,595
s, exactly once each. 
Now, I can understand if your eyes widen 

29
00:01:50,595 --> 00:01:53,025
a little bit when you see this 
definition. 

30
00:01:53,025 --> 00:01:57,016
After all, there's an awful lot of 
choices for this said capital s, an 

31
00:01:57,016 --> 00:02:00,504
exponential number. 
So you'd be right to wonder whether this 

32
00:02:00,504 --> 00:02:03,732
approach could possibly give any saving 
whatsoever. 

33
00:02:03,732 --> 00:02:08,377
Overt naive brute-force search. 
One thing worth pointing out, and in some 

34
00:02:08,377 --> 00:02:13,232
sense the source of the computational 
savings, is that in a given sub problem 

35
00:02:13,232 --> 00:02:18,202
while it's true we're keeping track of 
which vertices are visited en route from 

36
00:02:18,202 --> 00:02:22,491
1 to j, we're not keeping track Of the 
order in which those vertices get 

37
00:02:22,491 --> 00:02:26,671
visited, and indeed, if we had 1 
sub-problem for every order in which you 

38
00:02:26,671 --> 00:02:31,078
might visit vertices from a given source 
to a given destination, then we'd be 

39
00:02:31,078 --> 00:02:33,798
stuck with a factorial number of 
sub-problems. 

40
00:02:33,798 --> 00:02:38,330
As it is, since we forget the order, and 
we only focus on sub-sets, that gets us 

41
00:02:38,330 --> 00:02:41,502
down in more than 2^n range, and that's 
ultimately. 

42
00:02:41,502 --> 00:02:44,348
Where the savings over brute-force search 
come from. 

43
00:02:44,348 --> 00:02:48,322
So as usual once we've gotten the right 
set of sub-problems the rest of the 

44
00:02:48,322 --> 00:02:51,295
dynamic programming recipe pretty much 
writes itself. 

45
00:02:51,295 --> 00:02:55,092
But to fill in the details let me just 
tell you exactly what the optimal 

46
00:02:55,092 --> 00:02:59,084
substructure limit is, what's the 
corresponding recurrence and the final 

47
00:02:59,084 --> 00:03:02,402
pseudocode that gives us our better than 
brute-force search. 

48
00:03:02,402 --> 00:03:04,977
Algorithm for the traveling salesman 
problem. 

49
00:03:04,977 --> 00:03:09,517
So, for the optimal substructure lemma, 
we just focus on 1 particular subproblem. 

50
00:03:09,517 --> 00:03:12,432
So that corresponds to a choice of the 
destination J. 

51
00:03:12,432 --> 00:03:15,832
And a choice of the set s. 
It specifies 1, the starting point. 

52
00:03:15,832 --> 00:03:18,832
J, the ending point. 
Plus the rest of the intermediate 

53
00:03:18,832 --> 00:03:22,602
vertices that we've got to visit along 
the way, exactly once each. 

54
00:03:22,602 --> 00:03:28,061
So now we proceed exactly as we did 
before in [UNKNOWN], namely by focusing 

55
00:03:28,061 --> 00:03:32,964
on the last hop of this path P. 
So if the last hop is say from vertex K 

56
00:03:32,964 --> 00:03:38,113
to vertex J, then we can look at the 
previous part of the path, call it P 

57
00:03:38,113 --> 00:03:43,372
prime, what can we say about it? Well 
clearly it begins at the vertex 1. 

58
00:03:43,372 --> 00:03:46,494
[INAUDIBLE]. 
Clearly, it ends at the vertex k. 

59
00:03:46,494 --> 00:03:50,738
Because of the path p, it visited every 
vertex in s exactly once. 

60
00:03:50,738 --> 00:03:55,105
The path, p prime visits every vertex in 
the set s - j exactly once. 

61
00:03:55,105 --> 00:03:58,665
And, of course, the optimal substructure 
lemma says. 

62
00:03:58,665 --> 00:04:01,197
Not only is prime [INAUDIBLE]. 
Path. 

63
00:04:01,197 --> 00:04:05,332
From 1, to k, visiting exactly the 
vertices of s minus j, once each. 

64
00:04:05,332 --> 00:04:09,802
It is the shortest, such, path. 
The proof is an utterly standard, cut and 

65
00:04:09,802 --> 00:04:14,482
paste argument, like we've been doing for 
all our other optimal substructure 

66
00:04:14,482 --> 00:04:17,303
lemmas. 
I'll leave it as an exercise for you to 

67
00:04:17,303 --> 00:04:21,362
fill in the details. 
Tells, as usual the optimal substructure 

68
00:04:21,362 --> 00:04:26,241
lemma naturally induces a recurrence. 
recurrence just says well let's look at 

69
00:04:26,241 --> 00:04:31,051
all of the candidates when [UNKNOWN] 
solution identified by the lemma and the 

70
00:04:31,051 --> 00:04:36,652
boot force search among other candidates. 
That is, for a given sub-problem indexed 

71
00:04:36,652 --> 00:04:42,222
by a destination j and a set of vertices 
s, what the recurrence does is it takes 

72
00:04:42,222 --> 00:04:47,792
the best case scenario, that is the best 
choice of the second to last vertex k. 

73
00:04:47,792 --> 00:04:53,387
Of course, k should lie somewhere in the 
set S, also it should not be the vertex 

74
00:04:53,387 --> 00:04:56,904
j. 
And for a given choice of k, we know the 

75
00:04:56,904 --> 00:05:02,977
corresponding candidate solution has to 
be a shortest path from 1 to k. 

76
00:05:02,977 --> 00:05:08,734
Visiting exactly the vertices of s-j. 
Combined with the cost of the 

77
00:05:08,734 --> 00:05:13,903
corresponding final hop from k to j. 
And effectively, we're using the same 

78
00:05:13,903 --> 00:05:18,189
measure of sub-problem size that we were 
in the Bellman - Ford solution, just the 

79
00:05:18,189 --> 00:05:20,543
number of allowable intermediate 
vertices. 

80
00:05:20,543 --> 00:05:24,696
Of course, here, the sub-problem also 
specifies exactly which those vertices 

81
00:05:24,696 --> 00:05:28,852
are, but as far as a measure of size, we 
just use the cardinality of that set. 

82
00:05:28,852 --> 00:05:32,708
As always, the important property of the 
size measure is that to solve a 

83
00:05:32,708 --> 00:05:36,918
sub-problem of a given size, you only 
need to know the answers of sub-problems 

84
00:05:36,918 --> 00:05:41,212
with strictly smaller size, and staring 
at the recurrence, you see that that is 

85
00:05:41,212 --> 00:05:44,437
indeed the case here. 
You only care about sub-problems that 

86
00:05:44,437 --> 00:05:47,616
visit 1 fewer vertex, because in 
particular, j is excluded. 

87
00:05:48,684 --> 00:05:51,697
The dynamic programming algorithm writes 
itself. 

88
00:05:51,697 --> 00:05:56,264
We have a 2 dimensional array. 
2 dimensions, 'cuz he have 2 indices, for 

89
00:05:56,264 --> 00:05:59,534
each sub problem. 
One specifying the final vertex, j. 

90
00:05:59,534 --> 00:06:03,919
The other specifying the set capital s. 
It turns out we can get away with a 

91
00:06:03,919 --> 00:06:08,163
pretty small set of base cases. 
We're only going to need to pre-compute 

92
00:06:08,163 --> 00:06:11,892
the entries of the 2D array, in which the 
final destination, j=1. 

93
00:06:14,092 --> 00:06:19,822
It's the same as the started vertex.Now 
if S happens to contain just the vertex 

94
00:06:19,822 --> 00:06:25,347
1, then the empty path goes from 1 to 1 
with no intermediate stops that has 

95
00:06:25,347 --> 00:06:29,117
length 0. 
Otherwise, if S contains extra vertices 

96
00:06:29,117 --> 00:06:33,872
there's no way to go from 1 back to 1 
with intermediate stops. 

97
00:06:33,872 --> 00:06:38,342
Visiting the vertex 1 only once. 
So we define the other set problems with 

98
00:06:38,342 --> 00:06:43,252
j=1 want to have plus infinity value. 
Now as usual we saw what the set problem 

99
00:06:43,252 --> 00:06:47,867
systematically using the recurrence. 
We want to make sure that rather smaller 

100
00:06:47,867 --> 00:06:52,672
set problems are solved before the bigger 
ones, so 1 out of 4 loops we iterate over 

101
00:06:52,672 --> 00:06:57,102
the measure of set problem size which 
member of the carnality of the set s. 

102
00:06:57,102 --> 00:07:02,980
Amongst sub-problems of this same size, 
we don't care what order we solve them. 

103
00:07:02,980 --> 00:07:08,292
So we just iterate through the sets S of 
[UNKNOWN] M in arbitrary order. 

104
00:07:08,292 --> 00:07:13,533
So these outer two for loops accomplish 
the following, they iterate through the 

105
00:07:13,533 --> 00:07:17,435
relevant choices of S. 
That is sets S to have cardinality at 

106
00:07:17,435 --> 00:07:22,021
least 2 and contain the vertex of 1. 
So that's the first index of our 2D 

107
00:07:22,021 --> 00:07:24,949
array. 
What about the 2nd index J, so this is 

108
00:07:24,949 --> 00:07:30,095
where the shortest path in a given sub. 
The problem is supposed to stop.Well sub 

109
00:07:30,095 --> 00:07:34,674
problems only make sense, are only 
defined if that final vertex J, is one of 

110
00:07:34,674 --> 00:07:39,546
the vertices S, that you are supposed to 
visit.So when we get to a given choice of 

111
00:07:39,546 --> 00:07:44,435
capital S, at that point we are going to 
iterate through all the relevant choices 

112
00:07:44,435 --> 00:07:48,947
of j.and if you recall our argument in 
the base case and the fact the this SS 

113
00:07:48,947 --> 00:07:52,242
has size at east 2, there is no point in 
trying j=1. 

114
00:07:52,242 --> 00:07:57,147
So now that we have a subproblem. 
We have our choice of S, we have our 

115
00:07:57,147 --> 00:08:00,905
choice of J. 
We just compute the recurrence for this 

116
00:08:00,905 --> 00:08:05,305
particular subproblem. 
So what is that? Well, we take a best 

117
00:08:05,305 --> 00:08:11,026
case choice of the penultimate vertex. 
So that's some vertex that we've got to a 

118
00:08:11,026 --> 00:08:13,442
visit. 
A vertex k and capital S. 

119
00:08:13,442 --> 00:08:17,592
Other than the one where we stopped, so 
it can't be equal to j. 

120
00:08:17,592 --> 00:08:22,492
And then, given a choice of k, what is 
the corresponding solution of that 

121
00:08:22,492 --> 00:08:27,067
candidate path from 1 to j. 
Well, it's whatever the shortest path is 

122
00:08:27,067 --> 00:08:30,667
from 1 to k. 
Which of course is visiting everything in 

123
00:08:30,667 --> 00:08:34,154
s, except for j. 
Plus whatever the cost of the final hop 

124
00:08:34,154 --> 00:08:38,828
is from K to J, so the recurrence just 
figures out what is the optimal choice of 

125
00:08:38,828 --> 00:08:43,625
K by brute-force search and then by our 
optimal substructure lemma we know that's 

126
00:08:43,625 --> 00:08:48,975
the correct answer to this sub problem. 
Now in almost all of our dynamic 

127
00:08:48,975 --> 00:08:53,766
programming algorithms, after we solved 
for the sub problems, all we did was 

128
00:08:53,766 --> 00:08:58,494
return the value of the biggest one. 
Here we actually have to do a tiny bit of 

129
00:08:58,494 --> 00:09:01,623
extra work. 
It is not the case that the solution we 

130
00:09:01,623 --> 00:09:04,672
care about. 
The minimum cost traveling salesman tour 

131
00:09:04,672 --> 00:09:08,817
is literally one of the sub problems. 
However, it is true that we can easily 

132
00:09:08,817 --> 00:09:13,127
compute the minimum cost of a traveling 
salesman tour given solutions to all of 

133
00:09:13,127 --> 00:09:16,332
the sub problems. 
So what is the biggest sub problem? Well 

134
00:09:16,332 --> 00:09:20,077
there's actually n of them. 
Remember our measure of sub problem size 

135
00:09:20,077 --> 00:09:24,077
is the cardinality of the set S, the 
number of vertices that you've got to 

136
00:09:24,077 --> 00:09:27,962
visit in that In that sub problem. 
So in the biggest sub problems S is equal 

137
00:09:27,962 --> 00:09:31,524
to everything, you have to compute a 
shortest path that visits each of the N 

138
00:09:31,524 --> 00:09:34,475
vertices exactly once. 
So 1 are the semantics of 1 of those sub 

139
00:09:34,475 --> 00:09:38,266
problems, for a given choice of the 2nd 
index J, there's N different choices for 

140
00:09:38,266 --> 00:09:40,989
that final vertex J. 
That sub problem was responsible for 

141
00:09:40,989 --> 00:09:43,172
computing the length of the shortest 
path. 

142
00:09:43,172 --> 00:09:47,989
Its starts at the vertex 1 it concludes 
at the vertex J, and it visits every 

143
00:09:47,989 --> 00:09:52,418
single vertex exactly once. 
Now hows is this different than a travel 

144
00:09:52,418 --> 00:09:57,632
salesman tour? Well all thats missing is 
that finally hop, that hop from J to 1. 

145
00:09:57,632 --> 00:10:02,124
So what this means is that, after we've 
solved all of the subproblems in our 

146
00:10:02,124 --> 00:10:05,171
triple 4 loop. 
We can complete the algorithm with 1 

147
00:10:05,171 --> 00:10:09,101
final brute-force search. 
This final time that brute-force search 

148
00:10:09,101 --> 00:10:12,974
is over the choice of J. 
The last vertex that you visit before you 

149
00:10:12,974 --> 00:10:17,462
return home to vertex #1. 
So you can skip the choice of J, for 1, 

150
00:10:17,462 --> 00:10:22,297
that 1 doesn't make sense. 
But for the other N-1 possibilities for 

151
00:10:22,297 --> 00:10:27,813
that final vertex J, you take the 
previously computed value of the shortest 

152
00:10:27,813 --> 00:10:32,547
path, it starts at 1 and goes to J. 
Exactly once, you tack on to that the 

153
00:10:32,547 --> 00:10:37,118
cost of the final hop, going home to 1 
from vertex j, and of those n-1 different 

154
00:10:37,118 --> 00:10:41,689
solutions, you return the best of 'em. 
That is in fact the cost of the, minimum 

155
00:10:41,689 --> 00:10:45,829
cost travelling salesman tour. 
The correctness argument is the same as 

156
00:10:45,829 --> 00:10:48,422
it always is. 
The correctness of the optimal 

157
00:10:48,422 --> 00:10:52,230
substructure lemma Implies the 
correctness of the recurrence and then 

158
00:10:52,230 --> 00:10:55,840
the correctness of the dynamic 
programming algorithm follows from the 

159
00:10:55,840 --> 00:10:58,255
correctness of the recurrence plus 
induction. 

160
00:10:58,255 --> 00:11:00,639
The running time is equally straight 
forward. 

161
00:11:00,639 --> 00:11:04,486
So because of the sub problems we're 
keeping track of a lot of information. 

162
00:11:04,486 --> 00:11:07,735
Specifically the identities of all the 
intermediate vertices. 

163
00:11:07,735 --> 00:11:11,277
There are a lot of sub problems. 
So in particular there's roughly 2^n 

164
00:11:11,277 --> 00:11:15,987
choices for the set capital S, there's 
roughly n choices, for the final 

165
00:11:15,987 --> 00:11:19,071
destination j. 
So combining those, we get n*2^n sub 

166
00:11:19,071 --> 00:11:21,992
problems. 
How much work do you do per sub-problem? 

167
00:11:21,992 --> 00:11:26,487
Well you have to execute this brute-force 
search over the chase, choices of the 

168
00:11:26,487 --> 00:11:29,332
vertex k. 
The second to last vertex on the path, k 

169
00:11:29,332 --> 00:11:33,789
can be almost anything in the set capital 
S, capital S could have linear size. 

170
00:11:33,789 --> 00:11:37,276
So you're going to do o(n) work per 
sub-problem. 

171
00:11:37,276 --> 00:11:41,522
Thus, the overall amount of work, as 
promised is O of n^2*2n^2. 

172
00:11:41,522 --> 00:11:46,740
This is still exponential time, there's 
still sever limits on how big a value of 

173
00:11:46,740 --> 00:11:50,227
n you're going to be able to execute this 
algorithm for. 

174
00:11:50,227 --> 00:11:55,420
That said, the problem is mp complete and 
you have to respect its computational 

175
00:11:55,420 --> 00:11:59,199
intractability. 
And again, at least this shows there are 

176
00:11:59,199 --> 00:12:04,682
often interesting algorithmic opportu, 
opportunities to beat brute-force search, 

177
00:12:04,682 --> 00:12:08,326
even for mp complete problems like the 
traffic salesman.