1 00:00:00,012 --> 00:00:03,978 The goal of this video is to develop an exact algorithm for solving the vertex 2 00:00:03,978 --> 00:00:06,768 cover problem. The algorithm will run in exponential 3 00:00:06,768 --> 00:00:10,228 time in the worst case in general instances, but it's nevertheless 4 00:00:10,228 --> 00:00:12,993 qualitatively better than naive brute-force search. 5 00:00:12,993 --> 00:00:16,827 One consequence of this smarter search algorithm is that the vertex cover 6 00:00:16,827 --> 00:00:21,117 problem is computationally tractable in the special case where you're looking for 7 00:00:21,117 --> 00:00:23,141 a small optimal solution that uses at most a logarithmic number of vertices. 8 00:00:25,936 --> 00:00:29,887 Let me remind you quickly of the vertex cover problem. 9 00:00:29,887 --> 00:00:33,383 The input is simply an undirected graph G. 10 00:00:33,383 --> 00:00:39,129 A vertex cover of a graph is a subset of vertices that includes at least one 11 00:00:39,129 --> 00:00:44,177 endpoint of each edge of the graph. The goal on the vertex cover problem is 12 00:00:44,177 --> 00:00:46,539 to compute the smallest size of a vertex cover. 13 00:00:48,223 --> 00:00:53,716 Let's consider a variant on this problem where we're additionally given a target 14 00:00:53,716 --> 00:00:59,055 value k, k here is going to be a positive integer, and we want to know whether or 15 00:00:59,055 --> 00:01:02,562 not there is a vertex cover with size k or smaller. 16 00:01:02,562 --> 00:01:05,768 Now, as stated, I haven't really made the problem any easier. 17 00:01:05,768 --> 00:01:09,777 If you could solve this version of the problem where you're given a target k, 18 00:01:09,777 --> 00:01:13,761 you can also solve the original one, where you want to know the smallest value 19 00:01:13,761 --> 00:01:16,998 of any vertex cover. Namely, just run this sub-routine for all 20 00:01:16,998 --> 00:01:21,163 possible values of k from 1 up to n, and the smallest value of k for which you can 21 00:01:21,163 --> 00:01:25,825 find the vertex cover is then the answer to the original optimization version of 22 00:01:25,825 --> 00:01:28,981 the problem. Rather, to make the problem easier, I 23 00:01:28,981 --> 00:01:32,273 want to think about the special case where k is small. 24 00:01:32,273 --> 00:01:36,889 So, for now, I'm going to be deliberately vague about what I mean by small. We'll 25 00:01:36,889 --> 00:01:41,079 fill in those details in due time. Now, why might you be interested in the 26 00:01:41,079 --> 00:01:45,177 vertex cover problem when k is small? Well, we talked, for example, one thing 27 00:01:45,177 --> 00:01:49,257 you might be modeling is hiring a team. Each vertex corresponds to a potential 28 00:01:49,257 --> 00:01:52,357 team member, someone capable of performing various tasks. 29 00:01:52,357 --> 00:01:56,524 A vertex cover means you hire a team that is capable of handling any task that 30 00:01:56,524 --> 00:02:00,442 might be thrown your way where the edges of the graph correspond to tasks and the 31 00:02:00,442 --> 00:02:04,081 endpoints correspond to the two people capable of performing the task. And, you 32 00:02:04,081 --> 00:02:07,873 can imagine, maybe you're only interested in forming a team if it has reasonable 33 00:02:07,873 --> 00:02:10,696 size, so you have a budget for how many people you could hire. 34 00:02:10,696 --> 00:02:14,183 And then, you're only interested in solving the vertex cover problem when 35 00:02:14,183 --> 00:02:17,670 there's a good solution, a small cardinality vertex cover, otherwise, you 36 00:02:17,670 --> 00:02:20,736 just punt and you're not willing to take. on the project. 37 00:02:20,736 --> 00:02:25,710 You can also imagine maybe you have some domain expertise that suggests that your 38 00:02:25,710 --> 00:02:30,165 graphs do have small vertex covers, recall star is just a vertex cover of 39 00:02:30,165 --> 00:02:32,616 size one. So if you know that your graph is 40 00:02:32,616 --> 00:02:37,708 star-like in some sense, you might expect that there's an optimal solution that has 41 00:02:37,708 --> 00:02:41,335 few vertices. Is it useful algorithmically to focus on 42 00:02:41,335 --> 00:02:46,177 the case of k small? Well, yeah sure, if you're looking for a small optimal 43 00:02:46,177 --> 00:02:49,764 solution, then brute-force search isn't as bad as usual. 44 00:02:49,764 --> 00:02:54,531 If you want to know whether or not there's a vertex cover comprising only k 45 00:02:54,531 --> 00:02:58,312 vertices, you can just check every subset of k vertices. 46 00:02:58,312 --> 00:03:03,177 The number of subsets of k vertices, assuming that there are n to start with, 47 00:03:03,177 --> 00:03:06,758 would be n choose k, and for small k, that's going to be like theta of n to the 48 00:03:06,758 --> 00:03:09,943 k. So in principle this naive brute-force 49 00:03:09,943 --> 00:03:15,154 search runs in polynomial time, as long as k is constant, as long as k is bigger 50 00:03:15,154 --> 00:03:17,969 one. But practically speaking, I mean you 51 00:03:17,969 --> 00:03:23,474 can't really imagine running this naive brute-force search algorithm and unless k 52 00:03:23,474 --> 00:03:25,962 was super small. You know, say k=3, 53 00:03:25,962 --> 00:03:29,357 maybe k=4, depending on the size of your graph. 54 00:03:29,357 --> 00:03:34,052 In any case, this naive search is limited to very small values of k. 55 00:03:34,052 --> 00:03:39,272 And it's then natural to ask, as we always do, can we do better? Is there a 56 00:03:39,272 --> 00:03:44,972 smarter type of search? So, the answer is yes, there is a smarter way to go about 57 00:03:44,972 --> 00:03:50,274 this search that's going to allow us to solve the problem for qualitatively 58 00:03:50,274 --> 00:03:53,229 larger values of k. The search algorithm is going to be 59 00:03:53,229 --> 00:03:57,228 driven by a lemma which is similar in spirit to the kind of reasoning we did 60 00:03:57,228 --> 00:04:00,614 about optimal solutions in our dynamic programming algorithms. 61 00:04:00,614 --> 00:04:03,973 And I think it will be a little misleading to call it an optimal 62 00:04:03,973 --> 00:04:07,622 substructure lemma, so let me just call it a substructure lemma. 63 00:04:07,622 --> 00:04:13,120 So consider it some input, so our job, the job of our algorithm is to check 64 00:04:13,120 --> 00:04:18,390 whether some undirected graph G has a vertex cover of size k or not. 65 00:04:18,390 --> 00:04:23,482 Consider also, some edge, let's say between u and v of this graph. 66 00:04:23,482 --> 00:04:27,924 In the same way as in all our dynamic programming optimal substructure lemmas, 67 00:04:27,924 --> 00:04:32,143 we thought about taking the original instance and reducing it by one in some 68 00:04:32,143 --> 00:04:34,687 sense. Removing an item, removing a vertex, 69 00:04:34,687 --> 00:04:37,355 removing a position of the alignment, and so on. 70 00:04:37,355 --> 00:04:41,437 We're going to be thinking about this graph G, but with one fewer vertex. 71 00:04:41,437 --> 00:04:43,912 Actually, two different graphs of that form. 72 00:04:43,912 --> 00:04:48,482 One that we get by deleting u, that's one endpoint of the edge that we chose, and 73 00:04:48,482 --> 00:04:51,407 in addition to u, all of the edges incident to it. 74 00:04:51,407 --> 00:04:56,172 And we'll also look at the graph that we get, by deleting v, the other endpoint of 75 00:04:56,172 --> 00:05:00,937 the edge, and all of its incident edges. I'll use the notation Gu for the graph we 76 00:05:00,937 --> 00:05:04,967 get by deleting u, and G sub v for the graph we get by deleting v. 77 00:05:04,967 --> 00:05:09,982 The lemma asserts that we can reduce the question that we care about, 78 00:05:09,982 --> 00:05:15,947 namely does or does not G have a vertex cover of size k to analagous questions on 79 00:05:15,947 --> 00:05:20,857 the smaller graph, Gu and Gv. Specifically, G does have a vertex cover 80 00:05:20,857 --> 00:05:26,700 of size k if and only if at least one of Gu or Gv has a vertex cover of size k-1. 81 00:05:26,700 --> 00:05:30,966 So the proof is not too hard. I'll be able to squeeze it in on this 82 00:05:30,966 --> 00:05:34,866 slide, and then, once we know the substructure lemma is true, 83 00:05:34,866 --> 00:05:39,942 I'll show you how that leads to a smarter search algorithm for vertex cover. 84 00:05:39,942 --> 00:05:44,174 So this is an if and only if statement, so we'd have to have two different parts 85 00:05:44,174 --> 00:05:49,152 of the proof, assuming the left prove the right, assuming the right prove the left. 86 00:05:49,152 --> 00:05:51,634 Let's start by assuming the right-hand side. 87 00:05:51,634 --> 00:05:55,769 Let's assume that Gu or Gv or both, does in fact have a vertex cover of k-1. 88 00:05:55,769 --> 00:05:59,592 Let's show that then G must have a vertex cover of size k as well. 89 00:05:59,592 --> 00:06:04,707 It doesn't matter which of Gu or Gv has the good vertex cover. Let's just say Gu. 90 00:06:04,707 --> 00:06:09,534 So let's think for a second about which edges of the original input graph G 91 00:06:09,534 --> 00:06:14,240 survive into the smaller graph Gu. For the purpose of this proof, there's 92 00:06:14,240 --> 00:06:16,783 basically two types of edges in the world. 93 00:06:16,783 --> 00:06:21,564 There's ones where one of the endpoints is u, that is edges incident to u, and 94 00:06:21,564 --> 00:06:27,247 then there is edges where neither endpoint is the vertex u. So the edges in 95 00:06:27,247 --> 00:06:32,499 the graph Gu are precisely those where neither of the endpoints is u. 96 00:06:32,499 --> 00:06:38,281 So let's call those edges E sub u, those are edges not incident to u, the edges in 97 00:06:38,281 --> 00:06:41,713 G sub u. And then we'll call the edges incident to 98 00:06:41,713 --> 00:06:46,474 u that got deleted, Fu. So let's call k-1 vertices that the 99 00:06:46,474 --> 00:06:48,848 vertex cover in G sub u, let's call it capital S. 100 00:06:48,848 --> 00:06:53,275 What does it mean to be a vertex cover? It means you include at least one 101 00:06:53,275 --> 00:06:57,169 endpoint of every edge. So S, by virtue of being a vertex cover 102 00:06:57,169 --> 00:07:02,135 for G sub u, it means for every edge of E sub u, that is every edge of the original 103 00:07:02,135 --> 00:07:07,164 graph, neither of whose endpoint is u, S contains at least one of the endpoints. 104 00:07:07,164 --> 00:07:12,030 So, if we think about capital S in the original graph capital G, the only edges 105 00:07:12,030 --> 00:07:16,930 that its missing are those of F sub u or those where one of the two endpoints was 106 00:07:16,930 --> 00:07:19,843 the vertex u. Well, that means that we just take 107 00:07:19,843 --> 00:07:24,472 capital S and we throw in the vertex u, we get a bonafide vertex cover of the 108 00:07:24,472 --> 00:07:28,515 original graph capital G. S takes care of all of the edges of E sub 109 00:07:28,515 --> 00:07:32,759 u by definition, and then u single-handedly takes care of all of the 110 00:07:32,759 --> 00:07:36,166 edges of F sub u, because all of those edges are incident 111 00:07:36,166 --> 00:07:39,424 to u. So that's why if we assume the right-hand 112 00:07:39,424 --> 00:07:44,772 side, we can conclude the left, if either Gu or Gv, Let's say Gu has a small vertex 113 00:07:44,772 --> 00:07:50,511 cover S of size k-1. All we need to do is to augment it by the missing vertex u, 114 00:07:50,511 --> 00:07:53,461 and boom, we get a vertex cover of desired size k back from the original 115 00:07:53,461 --> 00:07:55,968 graph G. So, what about the other direction of the 116 00:07:55,968 --> 00:07:58,400 substructure lemma? Now we have, now we assume that the 117 00:07:58,400 --> 00:08:02,086 original graph G does in fact have a vertex cover of size only k. 118 00:08:02,086 --> 00:08:05,809 And we show that has to be reflected in one of these two subgraphs, 119 00:08:05,809 --> 00:08:10,100 either Gu or Gv must itself have a small vertex cover, size only k-1. 120 00:08:10,100 --> 00:08:16,204 Let's let capital S denote the small vertex cover of the original graph G. 121 00:08:16,204 --> 00:08:22,609 So S has k vertices, and every edge has at least one of its endpoints represented 122 00:08:22,609 --> 00:08:26,798 among this set. In particular, the edge uv that we 123 00:08:26,798 --> 00:08:32,132 singled out at the beginning, one of its end points, either u or v, may be both, 124 00:08:32,132 --> 00:08:34,812 but certainly has to be in the set capital S. 125 00:08:34,812 --> 00:08:38,307 Let's say u is in S. So lets think about the pink picture from 126 00:08:38,307 --> 00:08:42,272 the other direction of the proof. Let's think about exactly the same 127 00:08:42,272 --> 00:08:45,850 decomposition of the original edge set capital E into two sets. 128 00:08:46,940 --> 00:08:51,682 Now, this set S, remember, it's a vertex cover of the original graph G. 129 00:08:51,682 --> 00:08:54,843 So S contains at least one endpoint of every edge. 130 00:08:54,843 --> 00:08:59,783 But now, u, which is one of the vertices in S, sure it, is an endpoint of every 131 00:08:59,783 --> 00:09:04,871 edge of F sub u, every edge incident to u. But u is not an endpoint of any of the 132 00:09:04,871 --> 00:09:09,435 edges in E sub u. So what that means is the other k-1 133 00:09:09,435 --> 00:09:15,329 vertices of S have to be responsible for containing endpoints of all of the edges 134 00:09:15,329 --> 00:09:20,321 that survive in the G sub u. The vertex u takes care of edges incident 135 00:09:20,321 --> 00:09:23,466 to it. S with u removed, those k-1 vertices 136 00:09:23,466 --> 00:09:29,694 include an endpoint from all of the other edges, all of the edges in E sub u. 137 00:09:29,694 --> 00:09:34,106 So that completes the proof of the forward direction and therefore of the 138 00:09:34,106 --> 00:09:34,106 substructure lemma.