In this video, we'll begin our study of 
our final topic. 
First of all, what are NP-Complete, or 
computationally intractable problems. And 
second of all, what are some good 
strategies for approaching them 
algorithmically. 
Let's begin by formerly defining 
computational tractability via 
polynomial-time solvability, that is, 
we're going to define the complexity 
class P. 
As you know, from the past several weeks 
of hard work, the focus of these design 
and analysis of algorithms courses is to 
learn practical algorithms and relevant 
supporting theory for fundamental 
computational problems. 
By now, we have a very long list of such 
problems, sorting, searching, shortest 
path, minimum spanning trees, sequence 
alignment, and so on. 
But the sad fact is, I've given you a 
quite biased sample, thus far, of 
computational problems. And for many 
important computational problems, 
including ones you are likely to run into 
in your own projects, there is no known 
efficient, let alone blazingly fast 
algorithm for solving the problem. 
And, in as much as the focus of these 
courses is what you can do with 
algorithms rather than what you can't do. 
It would nevertheless be fraudulent, I 
think, to teach you about algorithms 
without discussing the specter of 
intractability that haunts the 
professional algorithm designer and the 
serious programmer. 
In the same way your program are toolbox, 
should include lots of different design 
paradigms like dynamic programming, 
greedy algorithms, divide and conquer, 
lots of different data structures, hash 
tables, balanced search trees and so on, 
that toolbox also should include the 
knowledge of computation intractable that 
is NP-Complete problems. 
Maybe even had to establish problem are 
NP-Complete. 
And again, this is because these problems 
are likely to show up in your own 
projects. 
It is not uncommon that a graduate 
student at Stanford will come into my 
office asking for some advice, how to 
approach a problem that's come up in 
their own project from an algorithmic 
perspective. 
Very often after they've sketched the 
problem for two minutes, five minutes, on 
my whiteboard it's very easy to see that 
it is in fact an NP-Complete problem. 
They should not seek out an efficient 
exactly correct algorithm and instead 
they should resort to one of the 
strategies for dealing with NP-Complete 
problems that we'll be discussing later. 
So in the next sequence of videos, my 
goal is to formalize the notion of 
computational intractability through the 
definition of NP-Completeness, and also 
give you a couple of examples. 
The relatively brief discussion that I'm 
going to give you of NP-Completeness is 
no substitute for a proper study of the 
topic. 
I encourage you to seek out a textbook or 
other free courses on the web to learn 
more. 
And indeed, NP-Completeness, I think, is 
a topic worth learning at least twice 
from multiple different perspectives. 
And indeed, those of you who have studied 
it before, I hope you find my treatment 
somewhat complementary to previous 
treatments that you might have seen. 
Indeed, NP-Completeness is one of the 
most famous, important, and influential 
exports from all of computer science to 
the broader scientific world. 
And as so many different disciplines are 
getting increasing computational, and 
here I'm thinking about everything from 
the mathematical sciences, to the natural 
sciences, even to the social sciences 
these fields have really no choice but 
to, confront and learn about 
NP-Completeness since it genuinely 
governs what can be accomplished 
computationally and what cannot. 
My perspective here will be 
unapologetically that of an algorithm 
designer. 
And in particular, after our discussion 
of NP-Completeness, the rest of this 
course is going to focus on algorithmic 
approaches to these problems. 
If you're confronted with an 
NP-Completeness problem, what should you 
do about it? So, rather than starting out 
by formalizing computational 
intractability, it's a little simpler to 
begin by defining computational 
tractability. 
That brings us to the uber important 
definition of polynomial-time 
solvability. 
So, the definition's a bit of a mouthful 
but it's sort of exactly what you think 
it would be. So, a problem is 
polynomial-time solvable, naturally, if 
there's a polynomial-time algorithm that 
solves it. 
That is, there's an algorithm and there's 
a constant k, so that if you feed in an 
input of length n to this algorithm, 
then it will correctly solve the problem. 
Either it will correctly answer yes or 
no, or it will correctly output an 
optimal solution. 
Whatever. 
Correctly solve them in time O(n^k). 
Pretty much all of the problems we have 
discussed it's been kind of obvious with 
the input length was, it was for example 
the number of vertices plus the number of 
edges, or it was the number of numbers 
that had to be sorted, etc. 
For an abstract problem, you know, just 
informally I would encourage you to think 
about the input length n as the number of 
key strokes on a computer that would be 
required to describe that given instance. 
And yes, it is true that strictly 
speaking to be polynomial-time solvable, 
all you need is a running time of O(n^k) 
for some constant k, even if you get k = 
10,000 that is enough to meet the demands 
of this definition. 
Of course, for concrete problems, you're 
going to see small values of k. 
In this course, k has usually been 1, 2, 
or 3. 
So, a quick comment for those of you that 
are wondering about randomized 
algorithms. 
And of course, we saw some cool examples 
of randomized algorithms back in Part I. 
So, to keep the discussion simple when we 
discuss computational intractability, 
let's just think only about deterministic 
algorithms. 
But at the same time, you know, don't 
worry, really. 
all of the qualitative conclusions we're 
going to reach are believed to hold 
equally well for randomized algorithms. 
In particular, we don't think that there 
are problems that deterministic 
algorithms require exponential time, and 
yet randomization allows you to magically 
get polynomial-time. There's even some 
mathematical evidence suggesting that 
that is the case. 
So that brings us to the definition of 
the complexity class, P. 
P is just defined as the set of all 
polynomials-time solvable problems. 
The set of all problems, that admits a 
polynomial-time algorithm solving it. 
For those of you who have heard of the P 
versus NP question, this is the same P. 
Throughout these courses, we've exhibited 
many examples of problems in P, 
right? That's really been sort of the 
whole point of the course. 
Sequence alignment, minimum cut, sorting, 
minimum spanning tree, shortest paths, 
and so on. 
There are two exceptions. 
One, we talked about explicitly at the 
time, which is when we discussed shortest 
paths in graphs with negative edge costs. 
And we stated that if you have negative 
cost cycles, and you wanted shortest 
paths that are simple, that do not have 
any cycles in them, then that turns out 
to be an NP-Complete problem. 
For those of you that know about 
reduction, it's an easier reduction from 
the Hamiltonian path problem. 
So, we did not give a polynomial-time 
algorithm for that version of the 
shortest path problem, 
we just punted on it. 
And the second example is a really subtle 
one. 
So, believe it or not, we did not 
actually give a polynomial-time algorithm 
for the knapsack problem. 
So, this requires an explanation because 
at the time, in our dynamic programming 
algorithm, 
I'll bet it felt like it was a 
polynomial-time algorithm. 
So, lets review what its running time 
was. So, in the knapsack problem, the 
input is n items that have values and 
sizes. 
And also you're given this knapsack 
capacity, which is the positive integer, 
W. 
And our table, our two dimensional array, 
had theta of n times W subproblems. 
We spent constant time filling in each 
entry. 
So the running time of our dynamic 
programming algorithm was theta of n, the 
number of items, times W, the knapsack 
capacity. 
What, on the other hand, is the length, 
of an instance of knapsack? Well, as we 
just said, the input to knapsack is 2n+1 
numbers, the n sizes, the n values, and 
the knapsack capacity. 
And, naturally, to write down any one of 
these numbers, you only need log of the 
magnitude of the number, right? So, if 
you have an integer k and you want to 
write it in binary, that's log base 2 of 
k bits. 
If you want to write it down in base 10 
digits, it's going to be log base 10 of 
k, which is the same up to a constant 
factor. 
So, the point is, the input length is 
going to be proportional to n, and 
proportional to log of the magnitudes of 
the numbers. 
In particular, log of W. 
So, here's another way to see why the 
dynamic programming algorithm is 
exponential in terms of the input length. 
Let me use an analogy. 
Suppose you were solving some problem 
over graph cuts. 
So remember, the number of cuts in a 
graph is exponential to the number of 
vertices, it's roughly 2^n for n 
vertices. 
So that means if you're doing brute force 
search, and I add just one more vertex to 
the graph, it doubles the running time of 
your algorithm. 
That's exponential growth. 
But actually, the same thing is going on 
in the knapsack program with a dynamic 
programming algorithm. 
Suppose we write everything in base 10 
and I just add one extra zero to the 
knapsack capacity, 
I multiply it by 10. 
Well then, the number of sub-problems you 
have to solve goes up by 10. 
So again, I just added one digit, one 
keystroke to the input. 
And yet, your running time got multiplied 
by a constant factor. 
So, it is again, exponential growth with 
respect to the encoding length of the 
input. 
Now, it's no accident that we failed to 
get a polynomial time algorithm for the 
knapsack problem because that is, in fact 
an NP-Complete problem. 
And again, we'll explain what NP-Complete 
means shortly. 
So that's the mathematical definition of 
the complexity class, P. 
But more important than the math is the 
interpretation. 
That is, how should you think about the 
class P. 
How shold you think about polynomial, 
polynomial-time solveability. 
To the practicing algorithm designer, the 
practicing programmer, membership in P, 
polynomial-time solvability, can be 
thought as rough litmus test for 
computational tractability. 
Now, the identification of algorithms 
that run in big O(n^k) time for some 
constant k, and practical computationally 
efficient algorithms is imperfect. 
There are, of course, algorithms which in 
principle run in polynomial-time but are 
too slow to be useful in practice. 
Conversely, there are algorithms which 
are not polynomial-time but are useful in 
practice. 
You already coated one up when you did 
the dynamic programming solution in 
knapsack and we'll see several more 
examples in future lectures. 
And more generally, anyone who has the 
guts to write down a precise mathematical 
definition like this complexity class P, 
needs to be ready for the inevitability 
that the definition will accommodate a 
few edge cases that you wish it didn't 
and that it will exclude a few edge cases 
that you wish it didn't. 
This is an inevitable friction between 
the binary nature of mathematical 
properties and the fuzziness of reality. 
These edge cases are absolutely no excuse 
to ignore or dismiss a mathematical 
definition like this one. 
Quite the contrary, the friction makes it 
that much more astonishing that you could 
write down a clean mathematical 
definition. 
Something that either is satisfied or not 
satisfied to every single computational 
problem and have it served as such a 
useful guide to classify problems into 
the tractable in practice problems and 
the intractable in practice problems. 
We now have 40 years of experience that 
indicates this definition has been 
unreasonably effective in just that kind 
of classification. 
Generally speaking, computational 
problems in P can be well solved using 
off the shelf solutions, as we've seen in 
the many examples in this class. 
Problems that are believed not to be in 
P, generally require significant 
computational resources, significant 
human resources, and a lot of domain 
expertise to solve in practice. 
So, we've mentioned in passing a couple 
of problems that are believed to not be 
polynomial-time solvable. 
Let me pause here and tell you about a 
more famous one, the traveling salesman 
problem. 
The traveling salesman problem remarkably 
just doesn't sound that different than 
the minimum spanning tree problem. A 
problem for which we now know a litany of 
greedy algorithms that run in near linear 
time. So, the input is an undirected 
graph, and let's go ahead and assume that 
it's a complete graph, so there's an edge 
between every pair of vertices. 
Every edge has a cost and the problem is 
non-trivial even if every edge cost is 
either one or two. 
Let me draw an example in green with four 
vertices. 
The responsibility of an algorithm for 
the TSP problem is to output a tour, 
that is a cycle that visits every single 
vertex exactly once. 
And amongst all tours, which you can 
think of as a permutation of the 
vertices, the order in which you visit 
them, you want the one that minimizes the 
sum of the edge costs. 
So, for example, in the green graph that 
I've drawn, the minimum cost tour has 
total cost 13. 
So, the TSP problem is simple enough to 
state, it's clear you can solve it by 
brute force search and roughly 
n-factorial times, just by trying all 
permutations of the vertices. 
But you know, we've seen many, many 
examples in this class where you can do 
better than brute force search. 
And, the TSP problem, people have been 
studying seriously, from a computational 
Perspective since at least the 1950's, 
including many huge names in 
optimization, people like George Danzik. 
So, despite the interest over 60 years, 
there is to date no known polynomial-time 
algorithm that solves the traveling 
sales. 
The problem. 
In fact, way back in 1965, Jack Edmonds, 
in a remarkable paper called Paths, Trees 
and Flowers, conjectured that no 
polynomial-time algorithm exists for the 
travelling salesman problem. 
This conjecture remains unresolved to 
this day, almost 50 years later. 
As we'll see, this is equivalent to the 
conjecture that P is not equal to NP. 
So, how would you formalize a proof of 
this conjecture? 
How would you amass evidence that this 
conjecture holds in the absence of a 
proof? 
Those are the topics of the upcoming 
videos.