1
00:00:00,012 --> 00:00:04,866
How do we amass evidence of computational 
intractability, something like the 

2
00:00:04,866 --> 00:00:09,237
traveling salesman problem. 
They key idea is through the formalism of 

3
00:00:09,237 --> 00:00:13,865
completeness, which in turn depends on 
the idea of a reduction between two 

4
00:00:13,865 --> 00:00:16,737
problems. 
We left off last video with Edmond's 

5
00:00:16,737 --> 00:00:21,252
prescient conjecture from 1965. 
In his paper Paths, Trees, and Flowers, 

6
00:00:21,252 --> 00:00:26,061
when, while pondering the traveling 
salesman problem, he conjectures that 

7
00:00:26,061 --> 00:00:29,176
there is no polynomial time algorithm for 
it. 

8
00:00:29,176 --> 00:00:34,262
That conjecture is equivalent, as we'll 
see, to a conjecture called P not equal 

9
00:00:34,262 --> 00:00:38,926
to NP, and this is not resolved. 
This remains to this day, one of the most 

10
00:00:38,926 --> 00:00:42,000
fundamental open problems in all of 
mathematics. 

11
00:00:43,058 --> 00:00:47,556
It's important that research in 
algorithms does not grind to a halt, just 

12
00:00:47,556 --> 00:00:52,784
because we have this open, and presumably 
extremely difficult, P versus NP problem 

13
00:00:52,784 --> 00:00:56,756
that we don't yet understand. 
We still want guidance about which 

14
00:00:56,756 --> 00:00:59,726
problems are easy, and which problems are 
hard. 

15
00:00:59,726 --> 00:01:03,920
So a pretty great idea, for giving us 
such guidance, guidance, 

16
00:01:03,920 --> 00:01:08,701
while the same time evading, actually 
answering the p versus np problem, is to 

17
00:01:08,701 --> 00:01:11,921
show that problems are difficult in a 
relative sense. 

18
00:01:11,921 --> 00:01:16,272
A mass evidence of intractability by 
showing that a problem is at least as 

19
00:01:16,272 --> 00:01:19,112
hard as, many other computational 
problems. 

20
00:01:19,112 --> 00:01:24,977
To make this idea relative difficulty 
precise, we need to say what we mean, 

21
00:01:24,977 --> 00:01:31,342
that one problem is as hard as another. 
The language for that is a reduction from 

22
00:01:31,342 --> 00:01:36,402
one problem to another. 
We say that a computational problem, say, 

23
00:01:36,402 --> 00:01:41,904
pi 1, reduces to another one, pi 2, if 
all you need to solve pi 1 efficiently is 

24
00:01:41,904 --> 00:01:46,699
an efficient algorithm for pi 2. 
That is, if I gave to you on a silver 

25
00:01:46,699 --> 00:01:50,902
platter a polynomial-time algorithm for 
the problem pi 2, 

26
00:01:50,902 --> 00:01:54,997
then using that as a subroutine, you 
could build a polynomial-time algorithm 

27
00:01:54,997 --> 00:01:57,617
for pi 1. 
So, we'll see some concrete examples in 

28
00:01:57,617 --> 00:02:01,555
just a sec, but let me just mention, I'm 
being slightly informal with this 

29
00:02:01,555 --> 00:02:04,274
definition. 
You'll see something a little bit more 

30
00:02:04,274 --> 00:02:08,494
formal if you look at a text book or take 
a class specifically on this topic, but 

31
00:02:08,494 --> 00:02:10,981
this is how you should think about 
reductions. 

32
00:02:10,981 --> 00:02:15,010
All you need to solve pi 1 is someone 
handing you an efficient subroutine for 

33
00:02:15,010 --> 00:02:16,662
pi 2. 
That is the algorithmic. 

34
00:02:16,662 --> 00:02:23,478
Way of thinking about reductions. 
Indeed, by virtue of being immersed in 

35
00:02:23,478 --> 00:02:29,951
algorithms over the past many weeks, we 
have examples of reductions. 

36
00:02:29,951 --> 00:02:35,172
The next quiz is going to ask you to 
recall some of them. 

37
00:02:35,172 --> 00:02:37,827
So the correct answer is, all of the 
above. 

38
00:02:37,827 --> 00:02:41,917
A, B, and C are all legitamite, 
reductions from one computational 

39
00:02:41,917 --> 00:02:45,327
problem, to another. 
So for example, answer A, this is 

40
00:02:45,327 --> 00:02:49,986
something we discussed back in part 1 of 
the course, when we first motivated 

41
00:02:49,986 --> 00:02:54,348
linear time median algorithms. 
At that point we were arguing that for a 

42
00:02:54,348 --> 00:02:58,375
median algorithm to be interesting, it 
better be faster than n log n time. 

43
00:02:58,375 --> 00:03:00,343
That's why we were shooting for linear 
time. 

44
00:03:00,343 --> 00:03:04,050
The reason being that one perfectly 
correct algorithm for computing a median 

45
00:03:04,050 --> 00:03:07,932
is you take the input array, you sort it, 
for example, using merge sort, n log in 

46
00:03:07,932 --> 00:03:11,262
time, and you return the middle element. 
That's going to be the median. 

47
00:03:11,262 --> 00:03:14,981
The second answer, b, so this is 
something we mentioned in passing, in the 

48
00:03:14,981 --> 00:03:18,575
naive implementation of Kruskal's 
algorithm, where we wanted to, check 

49
00:03:18,575 --> 00:03:21,251
cycles, and we were thinking about adding 
a new edge. 

50
00:03:21,251 --> 00:03:25,124
And we observed that, one way to check if 
a graph has a cycle, is you just run 

51
00:03:25,124 --> 00:03:28,952
depth first search on that graph. 
If there's a cycle, you'll detect it by 

52
00:03:28,952 --> 00:03:32,667
exploring an edge, which points backward 
to a vertex that you're, still in the 

53
00:03:32,667 --> 00:03:35,161
middle of exploring. 
The third example is maybe more 

54
00:03:35,161 --> 00:03:39,207
interesting, in the sense that we, invoke 
the provided sub-routine more than once. 

55
00:03:39,207 --> 00:03:42,974
When we first introduced the all paired 
shortest path problem, we observed that 

56
00:03:42,974 --> 00:03:47,391
one solution, and in certain cases is 
actually a very good solution, is to just 

57
00:03:47,391 --> 00:03:51,202
invoke a single-source shortest path 
subroutine, like Dijkstra's Algorithm or 

58
00:03:51,202 --> 00:03:54,749
the Bellman-Ford algorithm, n times, once 
for each choice of the source. 

59
00:03:54,749 --> 00:03:58,712
But again, it's still a reduction given 
an efficient algorithm, a polynomial time 

60
00:03:58,712 --> 00:04:01,019
algorithm. 
For the single-source shortest path 

61
00:04:01,019 --> 00:04:04,383
problem, you just run it n times, and 
that gives you a polynomial type 

62
00:04:04,383 --> 00:04:06,562
algorithm for the all-pairs shortest 
path. 

63
00:04:06,562 --> 00:04:09,848
Math problem. 
In fact, seasoned algorithm designers, 

64
00:04:09,848 --> 00:04:14,281
and seasoned programmers, are always 
looking for opportunities to employ 

65
00:04:14,281 --> 00:04:16,923
reductions. 
So when someone describes you a 

66
00:04:16,923 --> 00:04:21,652
computational problem, the very first 
thing you should try to think through is, 

67
00:04:21,652 --> 00:04:26,237
is this a problem I already know how to 
solve, just merely in disguise? So maybe 

68
00:04:26,237 --> 00:04:30,252
if you have to make sequence of 
decisions, maybe you can phrase it as a 

69
00:04:30,252 --> 00:04:34,102
shortest path problem, 
a problem that you already know excellent 

70
00:04:34,102 --> 00:04:37,302
solutions to. 
And if that fails, if the computational 

71
00:04:37,302 --> 00:04:41,568
problem you're confronted with isn't 
literally the same, as one you already 

72
00:04:41,568 --> 00:04:44,826
know how to solve, 
maybe by invoking known primitives, known 

73
00:04:44,826 --> 00:04:48,656
algorithms, a small number of times, 
that's efficient to solve this new 

74
00:04:48,656 --> 00:04:51,441
computational problem that you're 
confronted with. 

75
00:04:51,441 --> 00:04:54,948
So all of these are what I think of as 
honorable uses of reduction. 

76
00:04:54,948 --> 00:04:59,313
So somehow, the light side of the force. 
It spreads the frontier of ability to 

77
00:04:59,313 --> 00:05:02,910
cover the things we can do with 
algorithms wider and wider. 

78
00:05:02,910 --> 00:05:07,472
Now, for NP completeness, for 
establishing computational intractability 

79
00:05:07,472 --> 00:05:12,012
we have to turn this world on its head. 
This relies on a perverse use of 

80
00:05:12,012 --> 00:05:16,972
reductions, which I'll explain next. 
So suppose that the problem pi 1 reduces 

81
00:05:16,972 --> 00:05:21,887
to pi 2, in the sense that all you need, 
to solve pi 1 in polynomial time, is a 

82
00:05:21,887 --> 00:05:26,227
polynomial time algorithm for pi 2. 
That's the only missing piece. 

83
00:05:26,227 --> 00:05:31,127
Now, as algorithm designers, you're 
probably thinking about the happy case, 

84
00:05:31,127 --> 00:05:35,048
where we have a polynomial time algorithm 
for pi 2 sitting on the shelf. 

85
00:05:35,048 --> 00:05:38,932
We have something like Dijkstra's 
algorithm, or Bellman-Ford algorithm, and 

86
00:05:38,932 --> 00:05:43,047
we take it off the shelf, and we use it 
combined with this reduction, to solve pi 

87
00:05:43,047 --> 00:05:45,344
1. 
But, let's think about the contrapositive 

88
00:05:45,344 --> 00:05:48,052
of what it means when 1 problem reduces 
to another. 

89
00:05:48,052 --> 00:05:52,036
And now, lets think about the case where, 
we don't actually believe that it's 

90
00:05:52,036 --> 00:05:54,813
possible to solve pi 1 in polynomial 
time. 

91
00:05:54,813 --> 00:06:00,202
Well, if we don't think it's possible to 
solve pi1 efficiently, and pi1 reduces to 

92
00:06:00,202 --> 00:06:05,564
merely computing pi2 efficiently, well 
then we certainly can't believe that it's 

93
00:06:05,564 --> 00:06:08,851
possible to compute pi2 efficiently 
either. 

94
00:06:08,851 --> 00:06:14,422
That is, if pi 1 reduces to pi 2, and it 
just so happens that pi 1 cannot be 

95
00:06:14,422 --> 00:06:20,328
solved in polynomial time, then, pi 2 
cannot be solved in polynomial time 

96
00:06:20,328 --> 00:06:23,362
either. 
So this is the perverse use of a 

97
00:06:23,362 --> 00:06:29,062
reduction, that I mentioned earlier. 
This is the dark side of the force. 

98
00:06:29,062 --> 00:06:35,260
Rather than using a reduction to enlarge 
the frontier of tractability from pi2 to 

99
00:06:35,260 --> 00:06:41,175
include pi1 also, we're spreading the 
frontier of intractability from pi1 to 

100
00:06:41,175 --> 00:06:44,022
pi2. 
So this then is what we mean formally 

101
00:06:44,022 --> 00:06:47,730
when we say that 1 problem is at least as 
hard as another. 

102
00:06:47,730 --> 00:06:52,733
If Pi 1 reduces to Pi 2, what does that 
say? It says Pi 2 allows you to solve Pi 

103
00:06:52,733 --> 00:06:58,182
1, maybe lets you do other stuff as well, 
so therefore Pi 2 is at least as hard as. 

104
00:06:58,182 --> 00:07:03,059
As pi 1, so we now have the vocabulary to 
say that 1 problem is at least as hard as 

105
00:07:03,059 --> 00:07:06,190
another one, and let's remember what our 
plan was. 

106
00:07:06,190 --> 00:07:11,003
Our plan was to amass evidence of the 
intractability of the Traveling Salesman 

107
00:07:11,003 --> 00:07:15,737
problem by saying it's at least as hard 
as lots of other stuff, so to pass from 

108
00:07:15,737 --> 00:07:19,781
as hard as a single. 
Little problem to a big set of problems. 

109
00:07:19,781 --> 00:07:23,541
We're going to talk about the notion of 
completeness. 

110
00:07:23,541 --> 00:07:28,933
Formally consider some set C of problems 
and it's going to be tricky to get the 

111
00:07:28,933 --> 00:07:33,573
right set of problems, we'll discuss that 
in detail in next video. 

112
00:07:33,573 --> 00:07:37,031
But for now, just let C be any old set 
Problems. 

113
00:07:37,031 --> 00:07:41,103
We call a computational problem pi 
complete for C. 

114
00:07:41,103 --> 00:07:46,467
We call it C complete, if, it's at least 
as hard as everything in C. 

115
00:07:46,467 --> 00:07:52,286
If everything in C reduces to pi. 
Oh, and also, this problem pi? It should 

116
00:07:52,286 --> 00:07:56,423
be in this class itself. 
So, the second constraint here is the 

117
00:07:56,423 --> 00:07:59,834
more important one. 
It's the one that fits more directly into 

118
00:07:59,834 --> 00:08:02,890
our narrative. 
Remember we wanted to, have vocabulary 

119
00:08:02,890 --> 00:08:06,498
for saying one problem is as hard as a 
whole bunch of other stuff. 

120
00:08:06,498 --> 00:08:10,866
C is the bunch of other stuff so, pi is 
at least as hard as everything in C. That 

121
00:08:10,866 --> 00:08:14,646
is, everything in C reduces to pi. 
One nice reason, to have this first 

122
00:08:14,646 --> 00:08:18,639
constraint also to assume that the 
problem pi is a member of C, is that 

123
00:08:18,639 --> 00:08:24,240
gives the following nice interpretation. 
We can think of this problem pi as being 

124
00:08:24,240 --> 00:08:28,746
a hardest problem in the set C. 
It's in C, and it's at least as hard as 

125
00:08:28,746 --> 00:08:33,055
everything else in the class. 
So we're starting to see a confluence, 

126
00:08:33,055 --> 00:08:37,320
between our strategic plan, amassing 
evidence for the intractability of tsp 

127
00:08:37,320 --> 00:08:41,319
by, arguing it's as hard as a bunch of 
other stuff, and, the mathematical 

128
00:08:41,319 --> 00:08:44,228
formalism. 
Right? This notion of completeness gives 

129
00:08:44,228 --> 00:08:48,804
us a language to say that a problem is as 
least as hard as a bunch of other stuff. 

130
00:08:48,804 --> 00:08:53,763
So, how should we define all this other 
stuff? That is what is a suitable choice 

131
00:08:53,763 --> 00:08:59,102
for the class c? Well, we'd like to 
present the strongest evidence, the most 

132
00:08:59,102 --> 00:09:04,369
compelling case possible, of the 
traveling salesmen problem intractable, 

133
00:09:04,369 --> 00:09:08,722
so that says we should strive for the 
biggest set c we can find. 

134
00:09:08,722 --> 00:09:13,663
The more stuff This problem is as hard as 
the greater the evidence of its 

135
00:09:13,663 --> 00:09:17,321
contractability. 
So, we want to prove it complete for a 

136
00:09:17,321 --> 00:09:21,102
really big set c. 
Well, why not be ambitious and reach for 

137
00:09:21,102 --> 00:09:26,149
the stars? Why don't we just show that 
the travelling salesman problem is the 

138
00:09:26,149 --> 00:09:30,685
hardest problem in existence. 
Its C complete, even if we take the set C 

139
00:09:30,685 --> 00:09:35,540
to be all computational problems. 
Well, this is a good starting point, but 

140
00:09:35,540 --> 00:09:39,834
it turns out this is too ambitious. 
We're not going to be able to prove this 

141
00:09:39,834 --> 00:09:42,831
fact. 
There are, indeed, computational problems 

142
00:09:42,831 --> 00:09:47,851
out there, that are certainly, strictly 
harder, strictly more difficult to solve, 

143
00:09:47,851 --> 00:09:52,880
than the traveling salesman problem. 
One I hope you've heard of is the halting 

144
00:09:52,880 --> 00:09:57,891
problem. 
So in the halting problem, its very 

145
00:09:57,891 --> 00:09:59,956
simple. 
You're given as input some code, say a 

146
00:09:59,956 --> 00:10:02,772
program written in C, and you're given an 
input, and the responsibility of the 

147
00:10:02,772 --> 00:10:08,845
algorithm is to decide whether or not, 
the provided program will halt, 

148
00:10:08,845 --> 00:10:12,922
eventually, when you run it with the 
provided. 

149
00:10:12,922 --> 00:10:15,534
Input. 
Perhaps the obvious approach to the 

150
00:10:15,534 --> 00:10:20,398
halting problem is to just simulate the 
provided program, in an interpreter, in 

151
00:10:20,398 --> 00:10:23,780
effect, on the supplied input. 
But here's the problem. 

152
00:10:23,780 --> 00:10:28,368
Suppose you've been simulating the 
provided program on the provided input, 

153
00:10:28,368 --> 00:10:31,112
and you've been running for say ten 
hours. 

154
00:10:31,112 --> 00:10:34,852
Maybe you suspect that things in an info, 
an infinite loop is never going to halt, 

155
00:10:34,852 --> 00:10:38,310
but how do you know? Maybe if you ran it 
for one more hour, it would halt. 

156
00:10:38,310 --> 00:10:40,396
What if you've been running it for 10 
years? 

157
00:10:40,396 --> 00:10:43,857
Maybe now you're pretty sure it's in an 
infinite loop and never going to halt, 

158
00:10:43,857 --> 00:10:46,392
but how do you know? Maybe it's going to 
halt tomorrow. 

159
00:10:46,392 --> 00:10:51,752
Problem of not knowing when you're done, 
is an issue not just with the naive 

160
00:10:51,752 --> 00:10:57,117
simulation approach to the problem, but 
with any possible approach to this 

161
00:10:57,117 --> 00:11:01,712
computational problem. 
Indeed, Turing proved, in his landmark 

162
00:11:01,712 --> 00:11:06,865
1936 paper, that there is no algorithm, 
however slow Guaranteed to always 

163
00:11:06,865 --> 00:11:11,805
correctly solve the halting problem.So 
this is one of those theorems which is 

164
00:11:11,805 --> 00:11:16,987
simultaneously, at least in my opinion, 
very deep but also the proof is just like 

165
00:11:16,987 --> 00:11:20,102
4 lines. 
You prove it by contraction, you assume 

166
00:11:20,102 --> 00:11:25,163
that an algorithm for the halting problem 
does exist and then using the code for 

167
00:11:25,163 --> 00:11:29,765
that program you construct. 
A program that, on the one hand, can't 

168
00:11:29,765 --> 00:11:34,545
halt, and on the other hand, can't not 
halt, obviously a contradiction. 

169
00:11:34,545 --> 00:11:39,152
So the purported algorithm for solving 
the halting problem can exist. 

170
00:11:39,152 --> 00:11:43,163
So, and this is the kind of argument I 
find just Totally baffling. 

171
00:11:43,163 --> 00:11:47,209
I like to recall Von Norman's quote that, 
there's some things in mathematics that 

172
00:11:47,209 --> 00:11:49,524
you don't ever understand, you just get 
used to. 

173
00:11:49,524 --> 00:11:53,164
And this kind of diagonalization argument 
inspired by Kant, was originally, 

174
00:11:53,164 --> 00:11:56,651
original argument, that there's an 
uncountable number of real numbers, I 

175
00:11:56,651 --> 00:12:00,830
think fall squarely in that category. 
Anyways the point is that the traveling 

176
00:12:00,830 --> 00:12:03,556
salesman problem no longer looks quite 
that bad. 

177
00:12:03,556 --> 00:12:07,774
And we now realize that our original 
thought to prove that the TSP problem is 

178
00:12:07,774 --> 00:12:12,042
at least as hard as every other problem 
is a little misguided, is a little too 

179
00:12:12,042 --> 00:12:14,879
ambitious. 
So, it's certainly not at least as hard 

180
00:12:14,879 --> 00:12:18,290
as the halting problem. 
The halting problem is what is called 

181
00:12:18,290 --> 00:12:21,321
undecidable. 
There's no algorithm for it whatsoever. 

182
00:12:21,321 --> 00:12:23,785
Whatever time. 
Rather than I give you. 

183
00:12:23,785 --> 00:12:28,632
Where as the TSP problem by contrast, if 
I give you big enough time bound roughly 

184
00:12:28,632 --> 00:12:33,429
n factorial for n vertices, you can solve 
the problem namely using our favorite 

185
00:12:33,429 --> 00:12:37,824
punching bag Brute Force search. 
You can just try every one of the finite 

186
00:12:37,824 --> 00:12:42,582
possibilities and pick the best one. 
But the high level idea at the, top of 

187
00:12:42,582 --> 00:12:47,443
this slide, that is proving the 
travelling salesman problem complete, for 

188
00:12:47,443 --> 00:12:50,335
a big class of problems is, still a good 
one. 

189
00:12:50,335 --> 00:12:55,512
We just have to refine our understanding 
of the suitable choice of the class, c. 

190
00:12:55,512 --> 00:13:00,944
In particular By virtue of being solvable 
by brute force search, there are certain 

191
00:13:00,944 --> 00:13:04,844
problems that we cannot prove that TSP is 
at least as hard as. 

192
00:13:04,844 --> 00:13:10,001
But maybe we can prove that amongst all 
problems which can be solved, using brute 

193
00:13:10,001 --> 00:13:13,952
force search, amongst such problems, TSP 
is the hardest one. 

194
00:13:13,952 --> 00:13:19,049
Now to execute this idea we'll of course 
have to define what it means from a 

195
00:13:19,049 --> 00:13:24,379
problem to be brute force solvable. 
That would motivate the complexity class 

196
00:13:24,379 --> 00:13:26,646
NP, the subject of the next video.