1 00:00:00,012 --> 00:00:04,120 So I hope that pretty much all of you had heard about the P vs NP question. 2 00:00:04,120 --> 00:00:08,302 Before you enrolled in this class. But if you haven't, you can pretty much 3 00:00:08,302 --> 00:00:11,938 guess what that question is. I've defined for you, both of these 4 00:00:11,938 --> 00:00:15,090 classes of problems. P is the class of problems that are 5 00:00:15,090 --> 00:00:19,697 polynomial times solvable Solvable. Whereas the problems in NP have the 6 00:00:19,697 --> 00:00:24,412 property that, at least given the solution, you can quickly verify that it 7 00:00:24,412 --> 00:00:28,707 is, indeed a correct solution. It's why the conjecture that P is not 8 00:00:28,707 --> 00:00:33,327 equal to NP, that is, merely the ability to efficiently verify purported 9 00:00:33,327 --> 00:00:37,790 solutions, is not sufficient, to guarantee Polynomial time solvability. 10 00:00:37,790 --> 00:00:42,242 Indeed, Edmonds, back in 1965, before we even had the vocabulary NP, remember that 11 00:00:42,242 --> 00:00:45,724 came along only in '71. Edmonds already in '65, was, essentially 12 00:00:45,724 --> 00:00:49,886 conjecturing that P is not equal to NP. In the, form that he was conjecturing, 13 00:00:49,886 --> 00:00:53,815 there's no polynomial time algorithm, that solved the traveling salesman 14 00:00:53,815 --> 00:00:57,431 problem. But let me emphasize we genuinely do not 15 00:00:57,431 --> 00:01:02,703 know the answer to this question, there is no proof of this conjecture. 16 00:01:02,703 --> 00:01:08,390 P vs NP question is arguably the open question in computer science, it's also 17 00:01:08,390 --> 00:01:14,160 certainly one of the most important and deep, deepest open questions in all of 18 00:01:14,160 --> 00:01:18,203 mathematics. For example, in 2000 The Clay mathematics 19 00:01:18,203 --> 00:01:21,542 Institute published a list of 7 millennium. 20 00:01:21,542 --> 00:01:24,496 prize problems. The P vs NP question, is 1 of those 7 21 00:01:24,496 --> 00:01:27,201 problems. All of these problems, are extremely 22 00:01:27,201 --> 00:01:31,606 difficult, and extremely important. The only one that's been solved to date, 23 00:01:31,606 --> 00:01:35,291 is the [UNKNOWN] conjecture. The Remount hypothesis, is another 24 00:01:35,291 --> 00:01:38,753 example, on that list. And they're not called the millennium 25 00:01:38,753 --> 00:01:43,431 prize problems for nothing, if you solve 1 of these mathematical problems, you get 26 00:01:43,431 --> 00:01:47,932 a cash prize of 1 million dollars. Now, while a million dollars is obviously 27 00:01:47,932 --> 00:01:51,965 nothing to sneeze at, I think it sort of understates the importance of a 28 00:01:51,965 --> 00:01:56,432 mathematical question like P versus NP, and the advance in our knowledge that a 29 00:01:56,432 --> 00:02:00,961 solution to the question would provide, I think, would be much more significant 30 00:02:00,961 --> 00:02:04,765 than a price check. So how come so many people thing that P 31 00:02:04,765 --> 00:02:09,583 is not equal to NP, rather than the opposite, P=NP? Well, I think the 32 00:02:09,583 --> 00:02:14,354 dominant reason is sort of a psychological reason, namely that if it 33 00:02:14,354 --> 00:02:20,316 were the case that P=NP, then all you'd have to do, remember, is exhibit a 34 00:02:20,316 --> 00:02:23,541 polynomial time algorithm for just 1, np complete problem. 35 00:02:23,541 --> 00:02:25,890 And there are tons of np complete problems. 36 00:02:25,890 --> 00:02:30,159 And a lot of extremely smart people, have had np complete problems that they've 37 00:02:30,159 --> 00:02:34,605 really cared about, and either on purpose or accidentally, they've been trying to 38 00:02:34,605 --> 00:02:38,731 develop efficent algorithms for them. Noone has ever succeeded in over 1/2 39 00:02:38,731 --> 00:02:43,456 century of serious computational work. The second reason is sort of 40 00:02:43,456 --> 00:02:46,512 philosophical, P=NP just doesn't seem to jive with the 41 00:02:46,512 --> 00:02:49,937 way the world works. Think about it, for example, when you do 42 00:02:49,937 --> 00:02:52,383 a homework problem in a class like this one. 43 00:02:52,383 --> 00:02:56,483 And consider 2 different tasks. The first task is I give you question, 44 00:02:56,483 --> 00:03:00,940 and I asked you to come up with a correct solution, say a proof of some 45 00:03:00,940 --> 00:03:04,862 mathematical statement. The second task would be just grade 46 00:03:04,862 --> 00:03:09,604 somebody else's suggest proof. Well, generally it's a lot harder to come 47 00:03:09,604 --> 00:03:12,361 up with the correct argument from scratch, 48 00:03:12,361 --> 00:03:17,026 compared to just, verifying a correct solution, provided by, somebody else. 49 00:03:17,026 --> 00:03:21,568 And yet, P equals NP would be saying, that these two tasks, have exactly the 50 00:03:21,568 --> 00:03:24,813 same complexity. It's just as easy, to solve homework 51 00:03:24,813 --> 00:03:29,192 problems, as it is, to just read, and verify, the correct solution. 52 00:03:29,192 --> 00:03:33,327 So I don't know about you, but it's always seemed to me to be be a lot harder 53 00:03:33,327 --> 00:03:37,825 to come up with a mathematical argument from scratch as opposed to simply grading 54 00:03:37,825 --> 00:03:41,383 somebody else's solution. So now it seems to require a degree of 55 00:03:41,383 --> 00:03:45,393 creativity, to pluck out from this exponentially big space of proofs, a 56 00:03:45,393 --> 00:03:49,564 correct one for the statement at hand. Yet, P = NP would suggest that, that 57 00:03:49,564 --> 00:03:52,589 creativity could be efficiently automated. 58 00:03:52,589 --> 00:03:57,108 But of course, you know, P versus NP being a mathematical question. 59 00:03:57,108 --> 00:04:01,577 We'd really like some mathematical evidence, of which way it goes. 60 00:04:01,577 --> 00:04:05,850 For example, if P is not = N P. And here we really know shockingly 61 00:04:05,850 --> 00:04:08,788 little. There just isn't that much concrete 62 00:04:08,788 --> 00:04:12,862 evidence at this point, that, for example, P is not = to NP. 63 00:04:12,862 --> 00:04:17,472 Now, maybe it seems bizarre to you, that we're struggling to prove that P is not 64 00:04:17,472 --> 00:04:20,524 equal to NP. Maybe it just seems sort of obvious that 65 00:04:20,524 --> 00:04:25,045 there is no way that you can always construct proofs, in time, polynomial, in 66 00:04:25,045 --> 00:04:29,182 what you need to verify proofs. But, the reason this is so hard to prove 67 00:04:29,182 --> 00:04:33,886 mathematically, is because of the insane richness of the space of polynomial time 68 00:04:33,886 --> 00:04:36,482 algorithms. And indeed, it's this richness that we've 69 00:04:36,482 --> 00:04:40,173 been exploiting all along in these design and analyses of algorithms classes. 70 00:04:40,173 --> 00:04:42,439 Think for example about matrix multiplication. 71 00:04:42,439 --> 00:04:46,282 Had I not shown you Strauss's algorithm, I probably could have convinced you more 72 00:04:46,282 --> 00:04:50,023 or less, that there was no way to solve matrix multiplication faster than cubic 73 00:04:50,023 --> 00:04:52,109 time. You just look at the definition of the 74 00:04:52,109 --> 00:04:54,551 problem and it seems like you have to do cubic work. 75 00:04:54,551 --> 00:04:57,812 Yet, Strauss's algorithm and other follow ups show you can do. 76 00:04:57,812 --> 00:05:02,310 Fundamentally better, than, the naive cubic running time algorithm, for matrix 77 00:05:02,310 --> 00:05:05,036 multiplication. So there really are, some quite 78 00:05:05,036 --> 00:05:09,986 counter-intuitive, and hard to discover, unusually efficient algorithms, within 79 00:05:09,986 --> 00:05:12,479 the landscape of polynomial time solutions. 80 00:05:12,479 --> 00:05:17,052 And who's to say that there aren't some, more exotic species, in this landscape of 81 00:05:17,052 --> 00:05:21,552 polynomial time solvability, that have yet to be discovered, which can make. 82 00:05:21,552 --> 00:05:27,352 In-roads even on empty complete problems. At this point, we really don't know. 83 00:05:27,352 --> 00:05:32,702 At the very least our currently primitive understanding of the fauna within the 84 00:05:32,702 --> 00:05:37,902 complexity class P, is an intimidating obstruction to a proof that P is not 85 00:05:37,902 --> 00:05:40,755 equal to NP. I should also mention that, as an 86 00:05:40,755 --> 00:05:45,471 interesting counterpoint to Edman's Conjecture in '65, was a conjecture by 87 00:05:45,471 --> 00:05:48,274 Gödel. This is the same logician Kurt Gödel, 88 00:05:48,274 --> 00:05:51,529 Gödel's completeness and incompleteness theorems. 89 00:05:51,529 --> 00:05:56,483 He wrote a letter to John von Neumann in 1956, where he actually conjectured the 90 00:05:56,483 --> 00:05:59,279 opposite, the P is = to NP,so who knows. 91 00:05:59,279 --> 00:06:04,538 So I've tried to highlight for you, the most important things that an algorithm 92 00:06:04,538 --> 00:06:09,402 designer, and serious programmer should know, about NP problems, and NP 93 00:06:09,402 --> 00:06:13,229 completeness. One thing I haven't explained, which you 94 00:06:13,229 --> 00:06:18,318 might be wondering about, is, what on earth does NP stand for anyways?. 95 00:06:18,318 --> 00:06:25,138 A common guess would be not polynomial but this is not what it stands for. 96 00:06:25,138 --> 00:06:31,983 The answer's going to be a little bit more obscured in deed it's a bit of an 97 00:06:31,983 --> 00:06:38,142 acronym-ism nondeterministic polynomial So this is referring to a different but 98 00:06:38,142 --> 00:06:42,427 mathematically equivalent way to define the complexity class NP in terms of an 99 00:06:42,427 --> 00:06:46,132 abstract machine model known as nondeterministic turing machines. 100 00:06:46,132 --> 00:06:50,322 But, generally, for some of those thinking about algorithms, generally for 101 00:06:50,322 --> 00:06:54,555 the programmer I would advise against, thinking about problems in NP, in terms 102 00:06:54,555 --> 00:06:57,854 of this original definition, with this abstract machine model. 103 00:06:57,854 --> 00:07:02,143 And I'd instead, strongly encourage you, to think about the definition I gave you, 104 00:07:02,143 --> 00:07:06,332 in terms of the efficient recognition, the efficient verification, of purported 105 00:07:06,332 --> 00:07:09,220 solutions. Again, they're mathematically equivalent, 106 00:07:09,220 --> 00:07:14,182 and I think, efficient verification makes more sense It's in the algorithm design. 107 00:07:14,182 --> 00:07:18,269 Maybe you're thinking that N P is a perhaps not that good, and somewhat 108 00:07:18,269 --> 00:07:21,886 inscrutable definition for what's a super important concept. 109 00:07:21,886 --> 00:07:26,012 But it's not for lack of discussion and effort on the communities part. 110 00:07:26,012 --> 00:07:30,622 So very soon after the work of Cook and Carp, it was clear to everyone working in 111 00:07:30,622 --> 00:07:35,260 the west on algorithms and computation that this was a super important concept, 112 00:07:35,260 --> 00:07:39,012 and people needed to straighten up their vocabulary asap. 113 00:07:39,012 --> 00:07:42,302 Don Knuth ran a poll amongst members of the community. 114 00:07:42,302 --> 00:07:47,300 He reported on the results in his SIGACT news article from 1974, "A Terminological 115 00:07:47,300 --> 00:07:52,122 Proposal," and NP Completeness was the winner, and that was then adopted in the 116 00:07:52,122 --> 00:07:56,668 landmark book "Design and Analysis of Algorithms" by Hopcroft and Ullman, 117 00:07:56,668 --> 00:08:01,490 and that's the way it's been ever since. There is one suggestion that was passed 118 00:08:01,490 --> 00:08:07,287 over, which I find quite amusing, let me tell you about Suggestion was pet PET. 119 00:08:07,287 --> 00:08:15,902 So, what is PET acronym for? Well, it's flexible so initially the interpretation 120 00:08:15,902 --> 00:08:20,517 would be possible exponential time. In problem. 121 00:08:20,517 --> 00:08:25,292 Now suppose its some day people prove that P is not equal to NP, then the 122 00:08:25,292 --> 00:08:28,792 meaning would change to provably exponential time. 123 00:08:28,792 --> 00:08:34,117 So now its not the time to nit pick with the suggestion that you could prove P not 124 00:08:34,117 --> 00:08:39,392 equal to NP without actually proving an exponential lower bound merely a super 125 00:08:39,392 --> 00:08:43,342 polynomial bound. Lets leave objects like that of side and 126 00:08:43,342 --> 00:08:48,907 ask what would happen if P actually turned out to be equal to NP, well then, 127 00:08:48,907 --> 00:08:53,052 you could call PET previously exponential time problems. 128 00:08:53,052 --> 00:08:58,387 But of course, at this point PET is nothing more than an amusing historical 129 00:08:58,387 --> 00:09:02,052 footnote. NP-complete is the phrase that you got to 130 00:09:02,052 --> 00:09:02,385 know.