1 00:00:00,012 --> 00:00:03,725 We now arrive at the heart of the discussion, the definition of the 2 00:00:03,725 --> 00:00:07,541 complexity class NP as computational problems where you can efficiently 3 00:00:07,541 --> 00:00:11,596 recognize the solution, roughly the same thing is problem solvable is in good 4 00:00:11,596 --> 00:00:15,138 force search. The idea of NP completeness, that the problem can be as 5 00:00:15,138 --> 00:00:19,710 hard as any other problem, where you can efficiently recognize a solution, the 6 00:00:19,710 --> 00:00:23,715 ubiquity of NP-complete problems, including possibly a problem you're 7 00:00:23,715 --> 00:00:28,172 working on right now in one of your own projects, and what should be part of your 8 00:00:28,172 --> 00:00:32,252 toolbox, a simple recipe for proving that problems are NP-complete. 9 00:00:32,252 --> 00:00:37,359 We left off the last video noting that the travelling salesmen problem by virtue 10 00:00:37,359 --> 00:00:42,228 of being solvable in the exponential time using brute force search is certainly not 11 00:00:42,228 --> 00:00:45,815 as hard as notorious problems like the halting problem. 12 00:00:45,815 --> 00:00:50,471 So, perhaps we can instead of mass evidence of interactability for the TSP 13 00:00:50,471 --> 00:00:54,960 by showing it's as hard as any other problem solvable using brute-force 14 00:00:54,960 --> 00:01:00,242 search, that is every other brute-force search solvable problem reduces to 15 00:01:00,242 --> 00:01:06,045 solving the traveling salesman problem. To turn this idea into mathematics we 16 00:01:06,045 --> 00:01:12,097 have to identify the minimal ingredients necessary to solve a problem using brute 17 00:01:12,097 --> 00:01:15,717 force search. And the key idea turns out to be the 18 00:01:15,717 --> 00:01:19,272 efficient recognition of purported solution. 19 00:01:19,272 --> 00:01:23,047 [SOUND] Specifically, we say that a computational problem, 20 00:01:23,047 --> 00:01:28,271 like, say the traveling salesman problem, or frankly almost any other problem we've 21 00:01:28,271 --> 00:01:33,286 talked about in these classes is a member of the complexity class NP if it meets 22 00:01:33,286 --> 00:01:38,515 the following two criteria. The first property is sort of a simple 23 00:01:38,515 --> 00:01:44,971 prerequisite. It is search that solutions have length polynomial in the input size. 24 00:01:44,971 --> 00:01:51,097 The second property, and this is really the key one, is that if someone offers up 25 00:01:51,097 --> 00:01:56,804 to you a purported solution to a NP problem, you can verify its correctness 26 00:01:56,804 --> 00:02:01,138 in polynomial time. To make sense of this abstract 27 00:02:01,138 --> 00:02:04,465 definition, let's think about some concrete examples. 28 00:02:04,465 --> 00:02:07,726 Let's start just with the traveling salesman problem. 29 00:02:07,726 --> 00:02:11,763 So, consider the instance of the traveling salesman problem that's 30 00:02:11,763 --> 00:02:15,916 specified by some n vertices and distances amongst all of the pairs of 31 00:02:15,916 --> 00:02:18,806 vertices. And suppose we want to know whether or 32 00:02:18,806 --> 00:02:21,482 not there's a tour that has a total length. 33 00:02:21,482 --> 00:02:25,556 At most one thousand. So, let's for the moment set aside the 34 00:02:25,556 --> 00:02:31,250 problem of checking whether or not one of the n factorial tours does indeed have 35 00:02:31,250 --> 00:02:35,780 cost of at most 1000. Let's instead think about the much more 36 00:02:35,780 --> 00:02:40,892 modest question, of whether or not a single proposed tour has cost at most 37 00:02:40,892 --> 00:02:44,309 1000. Well, that computational problem is to 38 00:02:44,309 --> 00:02:48,752 verify whether or not a single suggested tour meets the target or not. 39 00:02:48,752 --> 00:02:52,693 That's an easy computational problem, right? The tour is specified just by 40 00:02:52,693 --> 00:02:55,508 order in which you're supposed to visit the vertices. 41 00:02:55,508 --> 00:02:59,645 The length of that ordering is certainly polynomial in the length of the input, 42 00:02:59,645 --> 00:03:03,515 and all you gotta do is add up the n edges that gets traversed in this tour, 43 00:03:03,515 --> 00:03:05,982 and check if this sum is at most 1000 or not. 44 00:03:05,982 --> 00:03:10,377 This argument shows that checking whether or not there exists a tour, with total 45 00:03:10,377 --> 00:03:14,532 cost and most sum threshold, does indeed belong to the complexity class NP. 46 00:03:14,532 --> 00:03:18,832 Right, you can write down a purported solution, in polynomial length. All you 47 00:03:18,832 --> 00:03:23,217 need to do is specify the ordering, and you can verify whether or not a candidate 48 00:03:23,217 --> 00:03:28,492 solution, whether or not a proposed tour meets the threshold in linear time, just 49 00:03:28,492 --> 00:03:31,032 by adding up the corresponding edge costs. 50 00:03:31,032 --> 00:03:35,527 If it bothers you that I moved away from the problem of computing an optimal, 51 00:03:35,527 --> 00:03:40,167 minimum cost tour, and studied instead the seemingly easier problem, I'm just 52 00:03:40,167 --> 00:03:44,792 checking, does there exist a tour that has total cost at most a given threshold, 53 00:03:44,792 --> 00:03:49,257 notice that the former problem reduces to Multiple indications of the latter 54 00:03:49,257 --> 00:03:52,216 problem, just by binary search over the threshold. 55 00:03:52,216 --> 00:03:56,233 The same reasoning can be applied to pretty much all the computational 56 00:03:56,233 --> 00:04:00,678 problems we've seen in these courses, placing all of them into the complexity 57 00:04:00,678 --> 00:04:03,473 class NP. So for the types of problems we've been 58 00:04:03,473 --> 00:04:08,271 thinking about, solutions can generally be specified in linear space, correctness 59 00:04:08,271 --> 00:04:12,613 can be verified in linear time. Another genre of problems that we haven't 60 00:04:12,613 --> 00:04:17,241 really discussed in these classes, but are important in the application domains, 61 00:04:17,241 --> 00:04:21,422 and are also canonical NP problems are constraint satisfaction problems. 62 00:04:21,422 --> 00:04:25,864 In general in a constraint satisfaction problem you have a bunch of variables. 63 00:04:25,864 --> 00:04:29,028 The simplest case would be binary or Boolean variables. 64 00:04:29,028 --> 00:04:33,309 And then you have a list of constraints, and each constraint specifies for a 65 00:04:33,309 --> 00:04:37,633 subset of the variables. a permitted list of configurations of 66 00:04:37,633 --> 00:04:41,151 those variables. So one simple case which some of you've 67 00:04:41,151 --> 00:04:46,055 seen and if you study NP completeness property you'll definitely see it will be 68 00:04:46,055 --> 00:04:49,573 the three set problem. The three set problem does indeed 69 00:04:49,573 --> 00:04:54,364 consider Boolean variables, so each variable xi can either be zero or one and 70 00:04:54,364 --> 00:04:58,484 you can think of the clauses as forbidding a single assignment to a 71 00:04:58,484 --> 00:05:02,096 triple vertices. So a clause might say you are not allowed 72 00:05:02,096 --> 00:05:06,028 to simultaneously assign. X3 to 1, X5 to 0, and X8 to 1. 73 00:05:06,028 --> 00:05:11,830 So clauses for big particular triples of assignments, and the question then is, 74 00:05:11,830 --> 00:05:16,766 does there exist an assignment of all of the variables to 0 an 1, that 75 00:05:16,766 --> 00:05:20,562 simultaneously satisfies all of the constraints. 76 00:05:20,562 --> 00:05:25,058 These kinds of constraint satisfaction problems also belong to the complexity 77 00:05:25,058 --> 00:05:27,475 class in p. So here if you have a purported 78 00:05:27,475 --> 00:05:32,008 assignment of all of the variables to values that meets all of the constraints, 79 00:05:32,008 --> 00:05:36,582 well you can write it down very simply just the value for each of the variables. 80 00:05:36,582 --> 00:05:41,042 And it's easy to check. I just iterate through the constraints 81 00:05:41,042 --> 00:05:46,507 one at a time and see if yes indeed, the values that you're suggesting for the 82 00:05:46,507 --> 00:05:52,272 variables in that constraint are on the list of permissible value combinations. 83 00:05:52,272 --> 00:05:57,317 At the beginning of this video, I suggested that the complexity class NP 84 00:05:57,317 --> 00:06:07,282 would represent problems that are brute force solvable, in the same way that the 85 00:06:07,282 --> 00:06:17,699 traveling salesman problem is. If you look at the actual definition I 86 00:06:17,699 --> 00:06:20,662 just gave you of the class NP, it's in terms of efficient recognition of 87 00:06:20,662 --> 00:06:20,662 solutions, but let's now observe that this definition NP, in terms of efficient 88 00:06:20,662 --> 00:06:20,662 verification of purported solutions does indeed imply that these problems can be 89 00:06:20,662 --> 00:06:22,326 solved in exponential time using brute-force search. 90 00:06:22,326 --> 00:06:27,196 So really, all you do here is check each of the candidate solutions one at a time. 91 00:06:27,196 --> 00:06:31,608 And in doing so, you see very clearly the role of the two properties in the 92 00:06:31,608 --> 00:06:36,675 definition of an NP problem. The role of the first property saying solution length 93 00:06:36,675 --> 00:06:41,749 has to be a polynomial in the input size, in turn, restricts the number of possible 94 00:06:41,749 --> 00:06:45,532 solutions, the number of bit strings of that given length, 95 00:06:45,532 --> 00:06:48,064 to be at most, exponential in the input size. 96 00:06:48,064 --> 00:06:52,684 The second property of course assures you that for each of those only exponentially 97 00:06:52,684 --> 00:06:57,049 many possibilities, you can verify whether or not it is a correct solution, 98 00:06:57,049 --> 00:07:00,951 whether or not it's a short TSB tour, whether or not it's a satisfying 99 00:07:00,951 --> 00:07:04,800 assignment to some variables in a constraint satisfaction problem in 100 00:07:04,800 --> 00:07:08,349 polynomial time. Now because the requirements for the 101 00:07:08,349 --> 00:07:12,189 class NP are so weak, the class NP is very big, where all 102 00:07:12,189 --> 00:07:17,802 membership in NP requires essentially as the ability to officially recognize the 103 00:07:17,802 --> 00:07:20,593 solution. You know one when you see one, 104 00:07:20,593 --> 00:07:26,027 and as you can imagine, that property is met by many of the computational problems 105 00:07:26,027 --> 00:07:29,747 that we ever think about. Factoring almost any graph problems you'd 106 00:07:29,747 --> 00:07:34,187 ever think about, sequencing problems, most constraint satisfaction problems, 107 00:07:34,187 --> 00:07:44,194 and so on. As we've said, it's true that not every, 108 00:07:44,194 --> 00:07:52,539 natural computational problem, is NP. The halting problem is an extreme example. 109 00:07:52,539 --> 00:07:52,539 That's in fact undecidable, there's no algorithm at all. There's also some 110 00:07:52,539 --> 00:07:52,539 natural problems and some application domains which are neither in NP, nor they 111 00:07:52,539 --> 00:07:52,539 undecidable. So, model checking is one domain that 112 00:07:52,539 --> 00:07:56,649 furnishes lots of examples of problems strictly harder than NP. 113 00:07:56,649 --> 00:08:01,629 The vastness of the complexity class NP means that NP-completeness is strong 114 00:08:01,629 --> 00:08:06,033 evidence of intractability. Remember what it means for a problem to 115 00:08:06,033 --> 00:08:09,429 be complete. For some set of problems, it means it's 116 00:08:09,429 --> 00:08:16,190 as hard as any other problem in that set. Every other problem in that set reduces 117 00:08:16,190 --> 00:08:28,645 to the complete problem. Therefore, suppose you had a polynomial 118 00:08:28,645 --> 00:08:36,851 time algorithm for just one, NP-complete problem, 119 00:08:36,851 --> 00:08:36,851 you would get, automatically, by the definition of a reduction, a polynomial 120 00:08:36,851 --> 00:08:36,851 time algorithm for every single computational problem in NP. 121 00:08:36,851 --> 00:08:36,851 Every single problem for which you can efficiently recognize a solution. 122 00:08:39,672 --> 00:08:44,772 That is, solving just 1 NP-complete problem in polynomial time would imply 123 00:08:44,772 --> 00:08:48,772 that NP is the same as P. Every problem in NP can be solved in 124 00:08:48,772 --> 00:08:52,572 polynomial time. It's pretty hard to really grok all of 125 00:08:52,572 --> 00:08:56,847 the ramifications of this kind of result, if P were to equal NP. 126 00:08:56,847 --> 00:09:01,072 So, just as one example, modern e-commerce, as we know it, would 127 00:09:01,072 --> 00:09:06,857 fall apart like a house of cards. That's because it depends on the security 128 00:09:06,857 --> 00:09:11,269 of crypto-sysytems, like RSA, which in turn assumes the computational 129 00:09:11,269 --> 00:09:13,965 intractibility of problems like factoring. 130 00:09:13,965 --> 00:09:18,896 Yet, an efficient algorithm for even the most obscure NP-complete problem, would 131 00:09:18,896 --> 00:09:24,491 automatically imply, imply, at least in principle, a polynomial time algorithm 132 00:09:24,491 --> 00:09:27,500 for factoring. So this provides a satisfying resolution 133 00:09:27,500 --> 00:09:29,954 to a narrative we began in the previous video. 134 00:09:29,954 --> 00:09:33,162 Recall, we were discussing the traveling salesman problem. 135 00:09:33,162 --> 00:09:37,473 And, like thousands of other people, we were wondering whether or not there was a 136 00:09:37,473 --> 00:09:41,958 polynomial time algorithm for it or not. For example, Edmunds conjectured, in '65, 137 00:09:41,958 --> 00:09:45,063 that there is no polynomial time algorithm for it. 138 00:09:45,063 --> 00:09:48,396 Now I've, obviously what you really want is, the answer. 139 00:09:48,396 --> 00:09:52,778 You either want a proof of Edmund's conjecture, that there's no efficient 140 00:09:52,778 --> 00:09:57,032 algorithm for TSB, or, even more astonishingly, you'd want a polynomial 141 00:09:57,032 --> 00:10:01,809 time algorithm that solves the problem. But we also have to face the reality that 142 00:10:01,809 --> 00:10:06,035 we may not know the answer for sure to this question, for quite a while. 143 00:10:06,035 --> 00:10:10,047 Years, certainly. Decades, probably. 144 00:10:10,047 --> 00:10:14,951 Centuries, who knows, maybe. So the challenge then for people thinking 145 00:10:14,951 --> 00:10:19,667 about algorithms and computation, is to nevertheless devise a theory which 146 00:10:19,667 --> 00:10:24,258 usefully differentiates tractable problems from relatively intractable 147 00:10:24,258 --> 00:10:26,622 ones, even in the face of this massive 148 00:10:26,622 --> 00:10:31,758 deficiency of our understanding of what is possible and what is impossible. 149 00:10:31,758 --> 00:10:36,466 And the brilliant way out is relative difficult as now personified by 150 00:10:36,466 --> 00:10:40,212 NP-completeness. If you want to amass extremely strong 151 00:10:40,212 --> 00:10:45,047 evidence of the computational intractability of a problem, prove that 152 00:10:45,047 --> 00:10:49,092 it's MP complete. Prove that a polynomial time algorithm 153 00:10:49,092 --> 00:10:54,602 for the problem, if it existed, would solve thousands upon thousands of 154 00:10:54,602 --> 00:10:55,315 unsolved computational problems.