In this video and the next, we're going to revisit the famous traveling salesman problem. When we first talked about TSP, it was bad news. The context was NP completeness. We discussed Edmond's Conjecture. That, it's widely believed there's no known polynomial time algorithm for solving the TSP problem. But let's talk about some good news. The fact that you can do better than naive brute-force search. In fact, it's going to be another neat application of the dynamic programming algorithm design. Paradigm. So let me remind you briefly about the traveling salesman problem. The input, very simple, just a complete un-directed graph, and each of the end choose two edges has a non-negative cost. The responsibility of the algorithm is to figure out the minimum cost way of visiting each vertex exactly once, that is you're supposed to output a tour, a permutation on the vertices. That minimizes the sum of the corresponding end edges. For example in this four vertex pink network the minimum cost towards overall cost thirteen. You could of course solve the problem using root four search, the running time of root four search would be in factorial. Tutorial. This would allow you to solve problems with say, 12, 13, maybe 14 vertices. In this video, and the next, we'll develop a dynamic programming algorithm that solves the TSP problem. Now, of course, TSP is NP complete. We have to expect that. We're not expected a polynomial time algorithm. But, this dynamic programming algorithm will run quite a bit faster than brute-force search. The running time will be big O of n^2 times 2^n. 2^n is of course exponential but it's quite a bit better than n factorial. n factorial is more ball park n^n. Okay, if you look at [INAUDIBLE] approximation you'll see it's really n over a constant raised to the n but still that's much much bigger than a constant 2 raised to the n. To make it more concrete, you could run this dynamic programming algorithm for values of n, probably pushing 30 or so. So I realize these are still pretty absurdly small problem sizes, compared to the size of the arrrays, that we can sort. Compared to the size of the graphs on which we compute strongly connected componenets, or shortest paths, but, that's how it goes with NP-complete problems. You have to work pretty hard, even to solve modest sized Problems. So at least this dynamic programming out rhyhm proves the point. That even for empty complete problems, there are opportunities to improve over brute-force search and the programmer tool box I've already equipped you with is efficient to make some of those improvements. Increments. For the traveling salesman problem, in particular, it's been a benchmark problem for the development of tools and optimization, and people have worked very hard, to solve very large instances of the traveling salesman problem. Even with, 10's of thousands of cities, even sometimes over 100,000 cities. But, I mean, this often represents years of work. As I said earlier there's some really nice Some popular accounts of the Traveling Salesman Problem books out there. Check it out if you want to know more about the state of the art for solving large TSP instances. So say we're going to pursue a dynamic programming approach to the TSP, so that means we should be looking for optimal substructure. A way in which an optimal must necessarily be composed of an optimal solution to a smaller sub-problem extended in some easy way to the original problem. So what could that look like for TSP? So of all of the programs that we've tackled using dynamic programming I think the 1 which seems the most like TSP is single source shortest paths. So think of a tour, not as a cycle exactly but as a path from some vertex, let's say vertex number 1 looping all the way back to itself, with, of course, the constraint that it should visit each other vertex exact. Once on root. We want to minimize the overall cost of this path from 1 back to itself, so that sounds a lot like wanting to minimize the length of a path from some source to some destination, and I'm sure you'll recall that our dynamic programming algorithm for the single source shortest path problem was the Bellman Ford. So what then did the sub-problems look like in the Bellman-Ford algorithm. Well, the cool idea there was to measure sub-problem size using an edge budget, so a cannotical sub-problem with Bellman-Ford was. To compute the length of a shortest path from the given source vertex to some destination vertex v. That has I edges or less. So, by analogy, we could think here about subproblems, where we want the shortest path from the starting node vertex, number 1, to some other vertex j. That uses, at most. I edges. So precisely, let's define sub-problems as follows. Given choice I, this represents the number of edges that you're allowed to use. Given destination J, let's assume that vertices are They're just named from 1 to n, so I'll call it generic destination j, an integer from 1 to n. Lets denote capital L sub i j, as the shortest length of a path from the starting vertex 1, to this destination j, using at most i edges. So suppose we try to set up a dynamic programming algorithm using these sub-problems. What do you think? Can we use these to get polynomial time algorithm for the TSP problem? All right so the correct answer is C. So this proposed collections sub problem does not yield the polynomial time algorithm that is D is not the correct answer that will be surprising, right TSP is an NP complete problem. So, if this give a polynomial time algorithm we can all report directly to the Clay Institute for a million dollar cash prize. Now there's a lot of students in this class, but at least we'd each get 20 bucks or so out of the deal. So that's not the case, this is not going to be problem time out with them. What's the problem, well the problem is also not A, it's not that there are too many subproblems. We only have o of n choice for i, and o of n choices for j. So there's only a quadratic number of sub-problems. Just like in Bellman-Ford, that's totally reasonable. We also can correctly compute the, value of large sub-problems from smaller ones, it's really exactly the same as what we were doing in Bellman-Ford. The problem is that our semantics are incorrect. By solving these sub-problems, we don't actually get an optimal solution to the traveling salesman incident that we started with. So what are we hoping is going to be the case? Well, in our dynamic programming algorithms thus far, after we solved all of the sub-problems, the answer to the original question was just lying there waiting for us in the biggest sub-problem. So here, the biggest sub-problem would correspond to i equaling n, where you allowed to use up to n edges in your path. The issue with these subproblems is they only specify an upper bound I. And the number of edges you're allowed to use. The dude does not enforce that you have to use your full budget of I edges. So that means when we look at these biggest sub problems. Yeah the shortest paths could use us as much as N edges if they wanted. But in general, they won't. They'll use much fewer than N edges. They'll skip tons of the vertices, and that's not a traveling salesman tour. A traveling salesman tour has to [INAUDIBLE]. Every vertex wants that is not enforced by these sub-problems. But that problem doesn't seem so hard to fix. Let's just insist in each sub-problem and instead of using most I-edges and the shortest path. It has to use exactly I edges and the shortest Path. So in this set of sub problems there's not quite as misguided as the previous one, the answer remains the same. The answer is still C. Of course, we still don't get polynomial time algorithm. We're not claiming to prove P = NP. The number of sub-problems hasn't changed. It's still quadratic, and if you think about it, you can still efficiently solve, for small, bigger sub-problems, with a bigger edge budget, given the solutions to all of the smaller sub-problems. So what's the issue? The issue is that our semantics are still incorrect. Just because you solve all of these sub-problems doesn't mean you can extract from it the cost of a minimum cost Traveling Sales. Has been told. So the problem briefly is that we don't enforce the constrain that you can't visit a vertex more than once. What would be hoping would be true, we're hoping and then we look at the biggest sub problem. So this is when I is equal to n and I'm looking at path that have exactly n edges and when we take j the destination to be. One, the same as the origin. We were hoping that that shortest path with images from one back to itself would be a tour and therefore the minimum cost travelling salesman tour. But that need not be the case. Just because this path has n edges and it starts at 1 and it ends at 1 doesn't mean it's a tour. It might for example visit vertices 7 and 23 twice, and as a result it never visits vertices 14 or 29 At all. For this reason, the number computed by the largest sub problem, when i = n and when j = 1. That can be much, much smaller than the true answer for the minimum cost traveling salesman tour. A good algorithm designer is nothing if not tenacious. So, let's just recognize the flaw in this proposal, that we're not enforcing. Fact that you can't have repeat visits to a vertex, and let's just change the subproblems again to explicitly disallow repeated visits to a vertex. So precisely let's index the sub-problems in exactly the same way as before. There's going to be one for each budget i and once for each destination j. For a given i and a given j, the value of the sub-problem is now defined, as the length, of a shortest path, beginning at one, ending at j. A, using exactly i edges, with the additional constraint, that no repeated vertices are allow. The one exception being that if j = 1, then of course you're allowed to have 1 at the beginning and 1 at the end. But for the internal vertices, and the shortest path, we do not allow, repeats. So what do you think? Does think refine collection of sub-problems allow us to get a polynomial time algorithm for the travelling salesman problem. So in today's quiz is more suttle than the past couple and the correct answer is switched from C to B. Its still the case that there is a quadratic number of sub-problems, its still the case that we don't expect upon on our time algorithm but now that with this different definition of sub problems they do capture the TSP. Specifically, look at the biggest subproblem. Take I equal to N, take J equal to 1. The responsibility of that subproblem is to compute the shortest path from 1 to 1 that has exactly n edges, and no vertices repeated internally. That is exactly the original problem that we started with. That is exactly TSP. The issue is that you cannot efficiently solve larger sub problems, problems with a larger edge budget, given the solutions to the smaller sub-problems. The smaller sub-problems are just not very useful for solving the bigger sub-problems. And the reason is a little bit subtle. Now the hope would be that like in all our previous dynamic programming algorithms, you can just formulate a recurrence which tells you how to fill in, how to solve your sub-problems given the solutions to smaller ones. And there's even a natural guess, for these sub-problems. What you hope the recurrence. Might be. Recurrences generally follow from a thought experiment about what optimal solutions have to look like. So you want to focus on a particular subproblem, like given destination j and a given budget i on the number of edges, you'd say, well let's think about an optimal solution. So what is that? That's a path that starts at one. It ends at j. It has exactly i edges. It has no repeated vertices. And amongst all such paths, it has the minimum length. And it's natural to proceed by analogy with Bellman - Ford, and say, well, wouldn't it be cool if a little birdie tells me what the penultimate vertex k was on the shortest path from 1 to j. So if I knew that the second to last vertex on the shortest path was k. Well then, surely, the length of this path would be the length of an optimal path from 1 to k. Using exactly I-1 edges, and no repeated vertices. With this final hop, kj concatenated at the end. Now, of course, you don't know what the second to last vertex k is. But no big deal. As usual, in dynamic programming, we're just going to try all of the possibility. So we can encode this root for search as a minimum over the sensible choices for k. Clearly, k should not be equal to j. It should be some other vertex that precedes j, and ignoring some base cases, let's also preclude k from being 1. That's the starting vertex. And of course, pictorially, what we have in mind, is we have in mind taking some shortest path. From one to k so we'd look that up that would be previously computed in some small sub problem and then can candidate into it a final half that goes from k to j. Well hey that sounds pretty good. Right, this is that kind of sound that may be this is the key ingredient to polynomial time algorithm for TSP. [INAUDIBLE]. So what's the problem? Well, the problem is that we've defined our sub problems to disallow repeated visits to a vertex. So when we compute a sub problem, we better make sure we're respecting that constraint. That no repeated visits are allowed. And if we have in our mind, concatenating a shortest path from 1 to K with no repeats with this final hop. k to j. Well this might result in a repeated visit to j. In particular, for all we know, the shortest path from 1 to k, that uses exactly i-1 edges, and has no repeated vertices, that path might well go through j. In which case, concatenating this final hop kj, results in the cycle of second visit to j. So, this bug means that the proposed recurrence is not correct. The recurrence is computing something, which, in general, is less than the actual shortest path length from 1 to j with exactly i edges. With the no repeat visits constraint respected. That could be a bigger number than what's computed by this recurrent. And I encourage to think about, is there any way to fix this bug? Could you propose a different, more complicated recurrence whicih would still easy to evaluate but was actually getting the sub problem correct? I don't think you're going to find one. So the moreal of the story is that this constraint in the traveling salesman problem that these vertex get visited exactly once, that's a quite tricky constraint. And we're going to have to work quite hard to make sure it's satisfied. The solution we're going to use is in some sense a logical next step from this recurrence here. We need to be able to know more information about sub problems than just where they end, we actually need to know the identities of the vertices on the path used to travel from 1 to K, we need to know whether or not V is on that path. So we're going to look at a much bigger set of sub-problems where we remember not just destinations, but also all of the intermediate stops. That idea will translate to a dynamic programming algorithm, not a polynomial time one needless to say, the details are coming right up.