In this video and the next, we're going 
to revisit the famous traveling salesman 
problem. 
When we first talked about TSP, it was 
bad news. 
The context was NP completeness. 
We discussed Edmond's Conjecture. 
That, it's widely believed there's no 
known polynomial time algorithm for 
solving the TSP problem. 
But let's talk about some good news. 
The fact that you can do better than 
naive brute-force search. 
In fact, it's going to be another neat 
application of the dynamic programming 
algorithm design. 
Paradigm. 
So let me remind you briefly about the 
traveling salesman problem. 
The input, very simple, just a complete 
un-directed graph, and each of the end 
choose two edges has a non-negative cost. 
The responsibility of the algorithm is to 
figure out the minimum cost way of 
visiting each vertex exactly once, that 
is you're supposed to output a tour, a 
permutation on the vertices. 
That minimizes the sum of the 
corresponding end edges. 
For example in this four vertex pink 
network the minimum cost towards overall 
cost thirteen. 
You could of course solve the problem 
using root four search, the running time 
of root four search would be in 
factorial. 
Tutorial. 
This would allow you to solve problems 
with say, 12, 13, maybe 14 vertices. 
In this video, and the next, we'll 
develop a dynamic programming algorithm 
that solves the TSP problem. 
Now, of course, TSP is NP complete. 
We have to expect that. 
We're not expected a polynomial time 
algorithm. 
But, this dynamic programming algorithm 
will run quite a bit faster than 
brute-force search. 
The running time will be big O of n^2 
times 2^n. 
2^n is of course exponential but it's 
quite a bit better than n factorial. 
n factorial is more ball park n^n. 
Okay, if you look at [INAUDIBLE] 
approximation you'll see it's really n 
over a constant raised to the n but still 
that's much much bigger than a constant 2 
raised to the n. 
To make it more concrete, you could run 
this dynamic programming algorithm for 
values of n, probably pushing 30 or so. 
So I realize these are still pretty 
absurdly small problem sizes, compared to 
the size of the arrrays, that we can 
sort. 
Compared to the size of the graphs on 
which we compute strongly connected 
componenets, or shortest paths, but, 
that's how it goes with NP-complete 
problems. 
You have to work pretty hard, even to 
solve modest sized Problems. 
So at least this dynamic programming out 
rhyhm proves the point. 
That even for empty complete problems, 
there are opportunities to improve over 
brute-force search and the programmer 
tool box I've already equipped you with 
is efficient to make some of those 
improvements. 
Increments. 
For the traveling salesman problem, in 
particular, it's been a benchmark problem 
for the development of tools and 
optimization, and people have worked very 
hard, to solve very large instances of 
the traveling salesman problem. 
Even with, 10's of thousands of cities, 
even sometimes over 100,000 cities. 
But, I mean, this often represents years 
of work. 
As I said earlier there's some really 
nice Some popular accounts of the 
Traveling Salesman Problem books out 
there. 
Check it out if you want to know more 
about the state of the art for solving 
large TSP instances. 
So say we're going to pursue a dynamic 
programming approach to the TSP, so that 
means we should be looking for optimal 
substructure. 
A way in which an optimal must 
necessarily be composed of an optimal 
solution to a smaller sub-problem 
extended in some easy way to the original 
problem. 
So what could that look like for TSP? So 
of all of the programs that we've tackled 
using dynamic programming I think the 1 
which seems the most like TSP is single 
source shortest paths. 
So think of a tour, not as a cycle 
exactly but as a path from some vertex, 
let's say vertex number 1 looping all the 
way back to itself, with, of course, the 
constraint that it should visit each 
other vertex exact. 
Once on root. 
We want to minimize the overall cost of 
this path from 1 back to itself, so that 
sounds a lot like wanting to minimize the 
length of a path from some source to some 
destination, and I'm sure you'll recall 
that our dynamic programming algorithm 
for the single source shortest path 
problem was the Bellman Ford. 
So what then did the sub-problems look 
like in the Bellman-Ford algorithm. 
Well, the cool idea there was to measure 
sub-problem size using an edge budget, so 
a cannotical sub-problem with 
Bellman-Ford was. 
To compute the length of a shortest path 
from the given source vertex to some 
destination vertex v. 
That has I edges or less. 
So, by analogy, we could think here about 
subproblems, where we want the shortest 
path from the starting node vertex, 
number 1, to some other vertex j. 
That uses, at most. 
I edges. 
So precisely, let's define sub-problems 
as follows. 
Given choice I, this represents the 
number of edges that you're allowed to 
use. 
Given destination J, let's assume that 
vertices are They're just named from 1 to 
n, so I'll call it generic destination j, 
an integer from 1 to n. 
Lets denote capital L sub i j, as the 
shortest length of a path from the 
starting vertex 1, to this destination j, 
using at most i edges. 
So suppose we try to set up a dynamic 
programming algorithm using these 
sub-problems. 
What do you think? Can we use these to 
get polynomial time algorithm for the TSP 
problem? 
All right so the correct answer is C. 
So this proposed collections sub problem 
does not yield the polynomial time 
algorithm that is D is not the correct 
answer that will be surprising, right TSP 
is an NP complete problem. 
So, if this give a polynomial time 
algorithm we can all report directly to 
the Clay Institute for a million dollar 
cash prize. Now there's a lot of students 
in this class, but at least we'd each get 
20 bucks or so out of the deal. 
So that's not the case, this is not going 
to be problem time out with them. 
What's the problem, well the problem is 
also not A, it's not that there are too 
many subproblems. 
We only have o of n choice for i, and o 
of n choices for j. 
So there's only a quadratic number of 
sub-problems. 
Just like in Bellman-Ford, 
that's totally reasonable. 
We also can correctly compute the, value 
of large sub-problems from smaller ones, 
it's really exactly the same as what we 
were doing in Bellman-Ford. 
The problem is that our semantics are 
incorrect. 
By solving these sub-problems, we don't 
actually get an optimal solution to the 
traveling salesman incident that we 
started with. 
So what are we hoping is going to be the 
case? Well, in our dynamic programming 
algorithms thus far, after we solved all 
of the sub-problems, the answer to the 
original question was just lying there 
waiting for us in the biggest 
sub-problem. 
So here, the biggest sub-problem would 
correspond to i equaling n, where you 
allowed to use up to n edges in your 
path. 
The issue with these subproblems is they 
only specify an upper bound I. 
And the number of edges you're allowed to 
use. 
The dude does not enforce that you have 
to use your full budget of I edges. 
So that means when we look at these 
biggest sub problems. 
Yeah the shortest paths could use us as 
much as N edges if they wanted. 
But in general, they won't. 
They'll use much fewer than N edges. 
They'll skip tons of the vertices, 
and that's not a traveling salesman tour. 
A traveling salesman tour has to 
[INAUDIBLE]. 
Every vertex wants that is not enforced 
by these sub-problems. 
But that problem doesn't seem so hard to 
fix. 
Let's just insist in each sub-problem and 
instead of using most I-edges and the 
shortest path. 
It has to use exactly I edges and the 
shortest Path. 
So in this set of sub problems there's 
not quite as misguided as the previous 
one, the answer remains the same. 
The answer is still C. 
Of course, we still don't get polynomial 
time algorithm. 
We're not claiming to prove P = NP. 
The number of sub-problems hasn't 
changed. 
It's still quadratic, and if you think 
about it, you can still efficiently 
solve, for small, bigger sub-problems, 
with a bigger edge budget, given the 
solutions to all of the smaller 
sub-problems. 
So what's the issue? The issue is that 
our semantics are still incorrect. 
Just because you solve all of these 
sub-problems doesn't mean you can extract 
from it the cost of a minimum cost 
Traveling Sales. 
Has been told. 
So the problem briefly is that we don't 
enforce the constrain that you can't 
visit a vertex more than once. 
What would be hoping would be true, we're 
hoping and then we look at the biggest 
sub problem. 
So this is when I is equal to n and I'm 
looking at path that have exactly n edges 
and when we take j the destination to be. 
One, the same as the origin. 
We were hoping that that shortest path 
with images from one back to itself would 
be a tour and therefore the minimum cost 
travelling salesman tour. 
But that need not be the case. 
Just because this path has n edges and it 
starts at 1 and it ends at 1 doesn't mean 
it's a tour. 
It might for example visit vertices 7 and 
23 twice, and as a result it never visits 
vertices 14 or 29 At all. 
For this reason, the number computed by 
the largest sub problem, when i = n and 
when j = 1. 
That can be much, much smaller than the 
true answer for the minimum cost 
traveling salesman tour. 
A good algorithm designer is nothing if 
not tenacious. 
So, let's just recognize the flaw in this 
proposal, that we're not enforcing. 
Fact that you can't have repeat visits to 
a vertex, and let's just change the 
subproblems again to explicitly disallow 
repeated visits to a vertex. 
So precisely let's index the sub-problems 
in exactly the same way as before. 
There's going to be one for each budget i 
and once for each destination j. 
For a given i and a given j, the value of 
the sub-problem is now defined, as the 
length, of a shortest path, beginning at 
one, ending at j. 
A, using exactly i edges, with the 
additional constraint, that no repeated 
vertices are allow. 
The one exception being that if j = 1, 
then of course you're allowed to have 1 
at the beginning and 1 at the end. 
But for the internal vertices, and the 
shortest path, we do not allow, repeats. 
So what do you think? Does think refine 
collection of sub-problems allow us to 
get a polynomial time algorithm for the 
travelling salesman problem. 
So in today's quiz is more suttle than 
the past couple and the correct answer is 
switched from C to B. 
Its still the case that there is a 
quadratic number of sub-problems, its 
still the case that we don't expect upon 
on our time algorithm but now that with 
this different definition of sub problems 
they do capture the TSP. 
Specifically, look at the biggest 
subproblem. 
Take I equal to N, take J equal to 1. 
The responsibility of that subproblem is 
to compute the shortest path from 1 to 1 
that has exactly n edges, and no vertices 
repeated internally. 
That is exactly the original problem that 
we started with. 
That is exactly TSP. 
The issue is that you cannot efficiently 
solve larger sub problems, problems with 
a larger edge budget, given the solutions 
to the smaller sub-problems. 
The smaller sub-problems are just not 
very useful for solving the bigger 
sub-problems. 
And the reason is a little bit subtle. 
Now the hope would be that like in all 
our previous dynamic programming 
algorithms, you can just formulate a 
recurrence which tells you how to fill 
in, how to solve your sub-problems given 
the solutions to smaller ones. 
And there's even a natural guess, for 
these sub-problems. 
What you hope the recurrence. 
Might be. 
Recurrences generally follow from a 
thought experiment about what optimal 
solutions have to look like. 
So you want to focus on a particular 
subproblem, like given destination j and 
a given budget i on the number of edges, 
you'd say, well let's think about an 
optimal solution. 
So what is that? That's a path that 
starts at one. 
It ends at j. 
It has exactly i edges. 
It has no repeated vertices. 
And amongst all such paths, it has the 
minimum length. 
And it's natural to proceed by analogy 
with Bellman - Ford, and say, well, 
wouldn't it be cool if a little birdie 
tells me what the penultimate vertex k 
was on the shortest path from 1 to j. 
So if I knew that the second to last 
vertex on the shortest path was k. 
Well then, surely, the length of this 
path would be the length of an optimal 
path from 1 to k. 
Using exactly I-1 edges, and no repeated 
vertices. 
With this final hop, kj concatenated at 
the end. 
Now, of course, you don't know what the 
second to last vertex k is. 
But no big deal. 
As usual, in dynamic programming, we're 
just going to try all of the possibility. 
So we can encode this root for search as 
a minimum over the sensible choices for 
k. 
Clearly, k should not be equal to j. 
It should be some other vertex that 
precedes j, and ignoring some base cases, 
let's also preclude k from being 1. 
That's the starting vertex. 
And of course, pictorially, what we have 
in mind, is we have in mind taking some 
shortest path. 
From one to k so we'd look that up that 
would be previously computed in some 
small sub problem and then can candidate 
into it a final half that goes from k to 
j. 
Well hey that sounds pretty good. 
Right, this is that kind of sound that 
may be this is the key ingredient to 
polynomial time algorithm for TSP. 
[INAUDIBLE]. 
So what's the problem? Well, the problem 
is that we've defined our sub problems to 
disallow repeated visits to a vertex. 
So when we compute a sub problem, we 
better make sure we're respecting that 
constraint. 
That no repeated visits are allowed. 
And if we have in our mind, concatenating 
a shortest path from 1 to K with no 
repeats with this final hop. 
k to j. 
Well this might result in a repeated 
visit to j. 
In particular, for all we know, the 
shortest path from 1 to k, that uses 
exactly i-1 edges, and has no repeated 
vertices, that path might well go through 
j. 
In which case, concatenating this final 
hop kj, results in the cycle of second 
visit to j. 
So, this bug means that the proposed 
recurrence is not correct. 
The recurrence is computing something, 
which, in general, is less than the 
actual shortest path length from 1 to j 
with exactly i edges. 
With the no repeat visits constraint 
respected. 
That could be a bigger number than what's 
computed by this recurrent. 
And I encourage to think about, is there 
any way to fix this bug? Could you 
propose a different, more complicated 
recurrence whicih would still easy to 
evaluate but was actually getting the sub 
problem correct? I don't think you're 
going to find one. 
So the moreal of the story is that this 
constraint in the traveling salesman 
problem that these vertex get visited 
exactly once, 
that's a quite tricky constraint. 
And we're going to have to work quite 
hard to make sure it's satisfied. 
The solution we're going to use is in 
some sense a logical next step from this 
recurrence here. 
We need to be able to know more 
information about sub problems than just 
where they end, we actually need to know 
the identities of the vertices on the 
path used to travel from 1 to K, we need 
to know whether or not V is on that path. 
So we're going to look at a much bigger 
set of sub-problems where we remember not 
just destinations, but also all of the 
intermediate stops. 
That idea will translate to a dynamic 
programming algorithm, not a polynomial 
time one needless to say, the details are 
coming right up.