1 00:00:00,012 --> 00:00:04,252 In this video and the, and the next, we'll develop an exact algorithm for 2 00:00:04,252 --> 00:00:07,076 solving what's called the vertex cover problem. 3 00:00:07,076 --> 00:00:11,675 This is a well known NP-complete problem. As such we certainly don't expect our 4 00:00:11,675 --> 00:00:15,406 exact algorithm to run in polynomial time on general instances. 5 00:00:15,406 --> 00:00:19,795 But, the algorithm does showcase. 2 of the algorithmic strategies that we 6 00:00:19,795 --> 00:00:22,767 discussed for dealing with NP complete problems. 7 00:00:22,767 --> 00:00:27,011 First of all, we can interpret this algorithm as on general instances. 8 00:00:27,011 --> 00:00:29,835 Well, it's true, running in exponential time. 9 00:00:29,835 --> 00:00:33,862 Nevertheless, improving markedly over the more obvious naive. 10 00:00:33,862 --> 00:00:37,248 Brute-force search. This illustrates the point that for many 11 00:00:37,248 --> 00:00:41,441 NP-complete problems there is a lot of room for algorithmic ingenuity, even 12 00:00:41,441 --> 00:00:45,182 though you're not going to come up with a polynomial time solution. 13 00:00:45,182 --> 00:00:49,411 A second interpretation of the algorithm we're going to develop is that it's 14 00:00:49,411 --> 00:00:53,859 polynomial time for special classes of instances, in particular instances that 15 00:00:53,859 --> 00:01:00,230 have an unusually good optimal solution. So, let me introduce you to the vertex 16 00:01:00,230 --> 00:01:04,923 cover problem. The input is simply an undirected graph. 17 00:01:04,923 --> 00:01:08,438 The goal is to identify the smallest size. 18 00:01:08,438 --> 00:01:13,770 That is, the minimum cardinality of a vertex cover of the graph. 19 00:01:13,770 --> 00:01:19,994 What is a vertex cover? We call a subset of the vertices a vertex cover, if for 20 00:01:19,994 --> 00:01:25,477 every edge, at least one, possibly both, of the edges' endpoints lie in the set S. 21 00:01:25,477 --> 00:01:28,786 There is, of course, always a feasible solution. 22 00:01:28,786 --> 00:01:31,798 You could always choose every single vertex. 23 00:01:31,798 --> 00:01:36,712 That's certainly going to be a vertex cover, but the hard question is, how do 24 00:01:36,712 --> 00:01:42,612 you make sure you get one endpoint from each edge, in the most parsimonious way. 25 00:01:42,612 --> 00:01:47,252 So what might this problem represent? Well perhaps you're assembling a team of 26 00:01:47,252 --> 00:01:50,282 some sort. So maybe a team of programmers, maybe of 27 00:01:50,282 --> 00:01:55,127 lawyers, maybe of football players, who knows? But think of the vertices as being 28 00:01:55,127 --> 00:01:58,147 potential people you could recruit onto your team. 29 00:01:58,147 --> 00:02:02,722 And think of an edge as representing a potential task that your team might have 30 00:02:02,722 --> 00:02:07,129 to complete and the two vertices in the edge represent the two people who are 31 00:02:07,129 --> 00:02:10,974 capable of accomplishing that task. So then, what a vertex cover is? 32 00:02:10,974 --> 00:02:15,253 It says, hire enough people so that every single task you can accomplish. 33 00:02:15,253 --> 00:02:19,852 You have one of the two people in your team that's capable of accomplishing that 34 00:02:19,852 --> 00:02:21,931 task. So you could imagine various 35 00:02:21,931 --> 00:02:25,941 generalizations of this problem. You could have a wait for each vertex, 36 00:02:25,941 --> 00:02:30,098 for example, indicating the salary that you would have to pay that person. 37 00:02:30,098 --> 00:02:34,230 You could also imagine that instead of edges you have what are called hyper 38 00:02:34,230 --> 00:02:38,240 edges if more then two people are capable of accomplishing a given task. 39 00:02:38,240 --> 00:02:42,505 But the unweighted graph vertex cover problem will be interesting enough for 40 00:02:42,505 --> 00:02:46,114 our purposes here. Many people are of the opinion that this 41 00:02:46,114 --> 00:02:50,304 is a poorly monitored problem. Many people think it should be called the 42 00:02:50,304 --> 00:02:55,134 Edge Cover Problem, not the Vertex Cover Problem because you are choosing vertices 43 00:02:55,134 --> 00:02:59,527 to in some sense cover the edges by picking at least one of their end points. 44 00:02:59,527 --> 00:03:02,529 But this is the conventional name for this problem. 45 00:03:02,529 --> 00:03:06,922 In the Vertex Cover Problem you are choosing a set of vertices and edges 46 00:03:06,922 --> 00:03:11,160 provide the constraints. Let's now jump to a quiz to make sure the 47 00:03:11,160 --> 00:03:16,902 problem definition is clear. So the correct answer is A. 48 00:03:16,902 --> 00:03:21,028 Let's consider each of the two graphs in turn. 49 00:03:21,028 --> 00:03:25,154 Let's start with the star graph on n vertices. 50 00:03:25,154 --> 00:03:29,738 So this has one vertex in the center and n-1 spokes. 51 00:03:29,738 --> 00:03:36,258 The claim is that it has a vertex cover of size one, obviously the minimum 52 00:03:36,258 --> 00:03:40,640 possible. That vertex cover comprises just that 53 00:03:40,640 --> 00:03:44,192 center vertex. Why is it a vertex cover? Well, every 54 00:03:44,192 --> 00:03:47,566 single edge is a spoke. And one of its endpoints is the center 55 00:03:47,566 --> 00:03:49,769 vertex. So you just pick the one vertex. 56 00:03:49,769 --> 00:03:52,362 And you hit all of the n- 1 edges of the graph. 57 00:03:52,362 --> 00:03:56,894 So how about a clique on n vertices? So, a graph in which every single one of the 58 00:03:56,894 --> 00:04:00,432 n choose 2 edges is present. Why is n-1 vertices necessary and 59 00:04:00,432 --> 00:04:05,078 sufficient for a vertex cover? Well, to see why it's necessary, consider any set 60 00:04:05,078 --> 00:04:08,050 that excludes at least two vertices of the graph. 61 00:04:08,050 --> 00:04:12,043 Because it's a clique, there's an edge between those two excluded vertices, and 62 00:04:12,043 --> 00:04:15,334 that's an edge, such that neither of its endpoints is in this set. 63 00:04:15,334 --> 00:04:19,232 So that's why the set is not a vertex cover if it has, at most, n-2 vertices in 64 00:04:19,232 --> 00:04:21,343 it. On the other hand, if it only excludes 65 00:04:21,343 --> 00:04:25,403 one vertex, then there cannot be an edge with both of its endpoints missing from 66 00:04:25,403 --> 00:04:28,858 this set, because there's only one vertex in total missing for the set. 67 00:04:28,858 --> 00:04:32,892 So that's why any subset of n-1 vertices will be a vertex cover of a clique, n-1 68 00:04:32,892 --> 00:04:35,799 is sufficient. Figuring out the minimum size of a vertex 69 00:04:35,799 --> 00:04:39,179 cover in these two special graphs probably didn't seem that hard. 70 00:04:39,179 --> 00:04:43,275 And it's not! But when you're talking about general graphs, that you don't know 71 00:04:43,275 --> 00:04:47,906 anything about Computing the Minimum Quantinality Vertex Cover is in fact an 72 00:04:47,906 --> 00:04:51,506 NP-complete problem. There is no polynomial time algorithm 73 00:04:51,506 --> 00:04:55,566 that solves it, unless P equals NP. So that's definitely a bummer. 74 00:04:55,566 --> 00:04:59,819 So let's review our strategies for dealing with NP-complete problems. 75 00:04:59,819 --> 00:05:04,220 So we discussed To an earlier video. So the first strategy is to identify 76 00:05:04,220 --> 00:05:08,310 computationally tractable special cases of your NP-complete problem. 77 00:05:08,310 --> 00:05:12,550 Now in the best case scenario, the version of the problem that you have to 78 00:05:12,550 --> 00:05:16,217 solve in your own application lies squarely inside one of these 79 00:05:16,217 --> 00:05:20,622 computationally tractable special cases, then, you're good to go. 80 00:05:20,622 --> 00:05:25,432 More commonly, the instances that arise in your application won't necessarily be 81 00:05:25,432 --> 00:05:28,712 in one of the computationally tractable special cases. 82 00:05:28,712 --> 00:05:33,027 But, it can still be useful to have subroutines ready for various special 83 00:05:33,027 --> 00:05:35,958 cases. One thing we discussed in the past is the 84 00:05:35,958 --> 00:05:39,614 potential of a hybrid approach. Perhaps you do a little of brute-force 85 00:05:39,614 --> 00:05:43,662 search, and for most of the branches of your brute-force search, the residual 86 00:05:43,662 --> 00:05:47,072 problem falls into a com, computational tractable special case. 87 00:05:47,072 --> 00:05:50,881 And it turns out there is some quite interesting computational tractable 88 00:05:50,881 --> 00:05:53,133 special cases of the vertex cover problem. 89 00:05:53,133 --> 00:05:56,903 And the story here is very much in parallel with discussions we've had in 90 00:05:56,903 --> 00:05:59,372 the past about the independent set problem. 91 00:05:59,372 --> 00:06:03,951 So, in particular, you can solve vertex cover in polynomial time on path graphs. 92 00:06:03,951 --> 00:06:08,319 Just like we did with independent sets. Actually, more generally, in trees. 93 00:06:08,319 --> 00:06:12,327 Or even more generally, in what's called graphs of bounded tree width. 94 00:06:12,327 --> 00:06:16,583 But at least, for trees, that's a application of dynamic programming which 95 00:06:16,583 --> 00:06:19,783 you're now. In a good position to solve yourself, so 96 00:06:19,783 --> 00:06:22,786 I encourage you to think that through as an exercise. 97 00:06:22,786 --> 00:06:27,117 Why can you solve the vertex cover problem in polynomial time, in trees? So 98 00:06:27,117 --> 00:06:31,286 another quite interesting class of graphs, where you can solve the vertex 99 00:06:31,286 --> 00:06:35,058 cover problem in polynomial time is the class of bipartite graphs. 100 00:06:35,058 --> 00:06:39,712 One way to think about a bipartite graph, it's, it's a graph that has no odd cycle, 101 00:06:39,712 --> 00:06:42,642 so then, obviously, trees are a special case. 102 00:06:42,642 --> 00:06:47,072 An equivalent definition is that you can partition the vertex set into two groups 103 00:06:47,072 --> 00:06:51,292 A and B, so that every single edge has exactly one endpoint in each of A and B. 104 00:06:51,292 --> 00:06:55,412 That is bipartite graphs can also be thought of as the graphs for which there 105 00:06:55,412 --> 00:06:58,616 is a cut that slices every single edge of the graph. 106 00:06:58,616 --> 00:07:02,909 It's far from obvious how to solve the vertex cover problem efficiently in 107 00:07:02,909 --> 00:07:07,252 bipartite graphs, but it can be done as an application of what's called the 108 00:07:07,252 --> 00:07:10,791 maximum flow problem. That's a graph problem which lies just 109 00:07:10,791 --> 00:07:14,184 beyond the techniques that we're covering in this class. 110 00:07:14,184 --> 00:07:19,289 So what I'm going to cover in the next video is using a different algorithm, yet 111 00:07:19,289 --> 00:07:23,546 another computationally tractable special case of vertex cover. 112 00:07:23,546 --> 00:07:28,720 Specifically when you're interested in the case that the vertex cover is small. 113 00:07:28,720 --> 00:07:33,649 And what I mean here by small is logarithmic in the number of vertices or 114 00:07:33,649 --> 00:07:36,405 less. The second strategy for dealing with 115 00:07:36,405 --> 00:07:41,274 NP-complete problems is to relax the requirement of correctness and to focus 116 00:07:41,274 --> 00:07:44,648 on heuristics. Algorithms that are fast but need not 117 00:07:44,648 --> 00:07:49,051 produce an optimal solution. So there are some pretty good heuristics 118 00:07:49,051 --> 00:07:52,436 available for the vertex cover problem. I'm not going to discuss them here 119 00:07:52,436 --> 00:07:56,025 because I think that time will be better spent discussing heuristics for the 120 00:07:56,025 --> 00:07:59,945 knapsack problem, but you can for example use the greedy algorithm design paradigm 121 00:07:59,945 --> 00:08:02,134 to generate some heuristics for vertex cover. 122 00:08:02,134 --> 00:08:05,798 Now, some gritty algorithms do better than others, but if you make the Greedy 123 00:08:05,798 --> 00:08:10,011 choices judiciously, you can get pretty good guarantees for the vertex cover 124 00:08:10,011 --> 00:08:12,769 problem. So the third strategy is to insist on 125 00:08:12,769 --> 00:08:17,343 correctness, in which case, you're not expecting to get polynomial time for 126 00:08:17,343 --> 00:08:20,827 general instances, unless you're intent on proving p=np. 127 00:08:20,827 --> 00:08:25,426 And so, while you were expecting an exponential running time, you still want 128 00:08:25,426 --> 00:08:30,660 a running time, which is qualitatively better than naive brute-force search. 129 00:08:30,660 --> 00:08:33,431 And this is exactly the point of the next video. 130 00:08:33,431 --> 00:08:38,030 We're going to give an algorithm which is indeed based on enumeration but it's a 131 00:08:38,030 --> 00:08:41,085 smarter enumeration than naive brute-force search. 132 00:08:41,085 --> 00:08:45,453 So we get a faster exponential running time and that will allow us to solve a 133 00:08:45,453 --> 00:08:46,821 wider class of problems.