1 00:00:00,012 --> 00:00:04,146 I'll prove this performance guarantee for you on the next slide. 2 00:00:04,146 --> 00:00:08,760 But let me first make a couple of comments. So first of all, the analysis 3 00:00:08,760 --> 00:00:11,921 of this algorithm stated in the theorem is tight. 4 00:00:11,921 --> 00:00:16,762 The 50% cannot be improved without making additional assumptions about the input. 5 00:00:16,762 --> 00:00:21,115 Indeed, here's an example, which is itself, is even I bipartite graph. 6 00:00:21,115 --> 00:00:24,045 So it's a tractable special case of maximum cut. 7 00:00:24,045 --> 00:00:28,934 But even on bitartike graphs, the local searh heuristic might return to you a cut 8 00:00:28,934 --> 00:00:33,302 which has merely 50% of the edges of a globally optimal, maximum cut. 9 00:00:33,302 --> 00:00:38,202 The example is drawn here in green. There's just 4 verticles and 4 edges. 10 00:00:38,202 --> 00:00:43,402 As I said it is a bi-parted graph, so the best part is to put U and W in one group 11 00:00:43,402 --> 00:00:46,952 and V and X in one group. That cuts all 4 of the edges. 12 00:00:46,952 --> 00:00:52,077 On the other hand, one possible output of the local search algorithm is the cut 13 00:00:52,077 --> 00:00:56,774 where U and V are in 1 of the groups. A, and W and X are the other group B. 14 00:00:56,774 --> 00:01:01,320 So this cut has only 2 crossing edges, only 50% of the maximum possible, yet it 15 00:01:01,320 --> 00:01:04,806 is locally optimal. If you take any of the 4 vertices and you 16 00:01:04,806 --> 00:01:09,313 switch it to the other group, you get a cut with 3 vertices on one side and then 17 00:01:09,313 --> 00:01:13,987 the vertex by itself, since every vertex in this graph has degree 2, all of those 18 00:01:13,987 --> 00:01:17,067 cuts are also only going to have 2 crossing edges. 19 00:01:17,067 --> 00:01:21,757 The second cautionary tale that I need to tell you is that you maybe shouldn't be 20 00:01:21,757 --> 00:01:26,022 super impressed with the perofmance guarnatee of 50% for the maximum cut 21 00:01:26,022 --> 00:01:28,922 problem. Indeed, even if I just took a uniform at 22 00:01:28,922 --> 00:01:33,776 random cuts, by which I mean for each of the invertices I flip Independently a 23 00:01:33,776 --> 00:01:36,870 fair coin. If the coin comes up heads, I put that 24 00:01:36,870 --> 00:01:40,060 vertex in A. If the coin comes up tails, I put that 25 00:01:40,060 --> 00:01:43,484 vertex in B. The expected number of edges that cross a 26 00:01:43,484 --> 00:01:48,027 random cut chosen in this way, is already 50% of the edges of the graph. 27 00:01:48,027 --> 00:01:51,842 Just for kicks, let me sketch a quick proof of this fact. 28 00:01:51,842 --> 00:01:55,835 Some of you may remember from part 1, I introduced a decomposition principle for 29 00:01:55,835 --> 00:01:58,976 analyzing the expected value of complicated random variables. 30 00:01:58,976 --> 00:02:02,911 You have a complicated random variable that you care about, what do you do, you 31 00:02:02,911 --> 00:02:05,537 express it as the sum of indicator random variables. 32 00:02:05,537 --> 00:02:08,422 Random variables that only take on the values of 0 and 1. 33 00:02:08,422 --> 00:02:12,437 When you apply linearity of expectation that reduces computing the complicated 34 00:02:12,437 --> 00:02:14,807 expectation, to a sum of simple expectations. 35 00:02:14,807 --> 00:02:18,682 So it turns out that decomposition principal works beautifully to prove this 36 00:02:18,682 --> 00:02:24,192 fact The complicated random variable that we care about is the number of edges that 37 00:02:24,192 --> 00:02:28,118 cross a random cut. The constituents that are indicator 0 1 38 00:02:28,118 --> 00:02:33,081 random variables are just whether or not a given edge crosses a random cut. 39 00:02:33,081 --> 00:02:37,625 That is for an edge E of this input graph G, I'm going to define the random 40 00:02:37,625 --> 00:02:42,285 variable X of E to be 1 if it's 2N. Points wind up in different groups, and 0 41 00:02:42,285 --> 00:02:44,817 if its 2 end points wind up in the same group. 42 00:02:44,817 --> 00:02:49,309 So what's the expected value of one of these indicator random variables x sub e? 43 00:02:49,309 --> 00:02:53,478 Well, as with any indicator random variable, that's just the probability 44 00:02:53,478 --> 00:02:57,462 that x of e=1, the probability that this edge e is cut by a random cut. 45 00:02:57,462 --> 00:03:01,558 What's the chance of that? Well lets say the endpoints of edge E, are u and v. 46 00:03:01,558 --> 00:03:05,646 There is 4 possibilities; u and v could both be in A, u and v could both be B, u 47 00:03:05,646 --> 00:03:09,403 could be in A and v could be in B, or u could be in B, and v could be in A. 48 00:03:09,403 --> 00:03:12,802 Each of those 4 outcomes is equally likely, probability of 1/4. 49 00:03:12,802 --> 00:03:16,815 In the first 2 cases, this edge is not cut by the random cut, the 2 endpoints 50 00:03:16,815 --> 00:03:20,082 are on the same side. In the other 2 cases, it is cut, they're 51 00:03:20,082 --> 00:03:23,518 on different sides. So it's a 1/2 probability that this edge 52 00:03:23,518 --> 00:03:27,440 is cut in a random cut, therefore the expected value of X sub B is 1/2. 53 00:03:27,440 --> 00:03:31,299 And now we're good to go just by applying linearity of expectations. 54 00:03:31,299 --> 00:03:35,903 So, precisely, what do you care about? We care about the expected number of edges 55 00:03:35,903 --> 00:03:39,266 across a random cut. Well the number of edges crossing a cut 56 00:03:39,266 --> 00:03:42,890 is just by definition the sum of the axes over all of the edges E. 57 00:03:42,890 --> 00:03:46,114 X E is just whether or not. Now the given edge crosses the cut. 58 00:03:46,114 --> 00:03:49,296 So the expected value of the random variable we care about the number of 59 00:03:49,296 --> 00:03:51,896 crossing edges that's by linear [INAUDIBLE] expectation. 60 00:03:51,896 --> 00:03:55,439 Just the sum of the expected values of these indicated random variables the X of 61 00:03:55,439 --> 00:03:57,442 e's. Each of those has value one half, we're 62 00:03:57,442 --> 00:04:00,628 summing up one for each of the edges. So the total of the sum is just the 63 00:04:00,628 --> 00:04:03,027 number of edges divided by 2 and as claimed. 64 00:04:03,027 --> 00:04:05,274 Thanks for indulging my little digression. 65 00:04:05,274 --> 00:04:09,289 Again, the point of which is that just taking a random cut digital performance 66 00:04:09,289 --> 00:04:11,642 guarantee of 50% for the maximum cut problem. 67 00:04:11,642 --> 00:04:15,514 But, in the defense of local search, which also only gets a 50% performance 68 00:04:15,514 --> 00:04:19,405 guarantee, it took a while and in fact you have to work pretty hard to do, to 69 00:04:19,405 --> 00:04:24,763 get a performance guarantee better than 50% with a polynomial time algorithm, for 70 00:04:24,763 --> 00:04:28,306 the max cut problem. The most famous such algorithm is by 71 00:04:28,306 --> 00:04:32,194 Gilmers and Williamson. That took until 1994 and it requires a 72 00:04:32,194 --> 00:04:37,299 tool called semi-definite programming, something that's even more powerful than 73 00:04:37,299 --> 00:04:40,861 linear programming. Let's now prove that local search is 74 00:04:40,861 --> 00:04:45,782 guaranteed to output a cut whose number of crossing edges is at least half. 75 00:04:45,782 --> 00:04:49,786 The total number of edges ion the graph. So pick your favorite locally optimal cut 76 00:04:49,786 --> 00:04:52,022 A-B. And here by locally optimal, I just mean 77 00:04:52,022 --> 00:04:56,079 something that the algorithm might return That is, a cut for which it is impossible 78 00:04:56,079 --> 00:05:00,192 to swap a single vertex from one side to the other and improve the value of Of the 79 00:05:00,192 --> 00:05:02,727 cut. By virtue of being locally optimal it 80 00:05:02,727 --> 00:05:07,533 must be the case that furthest cut AB and for every single vertex of V, the number 81 00:05:07,533 --> 00:05:12,314 of edges incident to this vertex that are crossing the cut is at least as large as 82 00:05:12,314 --> 00:05:17,029 the number of vertices incident to this vertex that do not cross the cut, so the 83 00:05:17,029 --> 00:05:21,004 previous notation that is C sub V. Is at least as big as D sub V. 84 00:05:21,004 --> 00:05:23,931 If not, swapping V would give us a better cut. 85 00:05:23,931 --> 00:05:28,357 So we have N of these inequalities, one is for each vertex V, so we can 86 00:05:28,357 --> 00:05:32,370 legitimately sum up those inequalities combining them into one. 87 00:05:32,370 --> 00:05:37,577 Now, let's focus first on the right hand side of this summed up inequality, so the 88 00:05:37,577 --> 00:05:40,362 sum over all of the vertices in the graph. 89 00:05:40,362 --> 00:05:44,985 Of the number of edges incident to the vertex that are crossing the cut. 90 00:05:44,985 --> 00:05:50,187 Now here's the main point, what is this sum on the right hand side counting? It's 91 00:05:50,187 --> 00:05:53,918 counting each edge that crosses the cut AB exactly twice. 92 00:05:53,918 --> 00:05:57,483 Consider an edge, say U,W which is crossing the cut AB. 93 00:05:57,483 --> 00:06:02,522 It gets counted twice in the right hand side, once when V=U and once when V=W. 94 00:06:02,522 --> 00:06:06,927 We can apply exactly the same reasoning to the left hand side. 95 00:06:06,927 --> 00:06:12,557 What is this I'm counting? It's counting each non-crossing edge exactly once. 96 00:06:12,557 --> 00:06:17,652 Consider an edge against a u comma x who's both endpoints are on the same 97 00:06:17,652 --> 00:06:20,862 side. That's going to be counted once in this 98 00:06:20,862 --> 00:06:25,881 sum when v=u and then again when v=x. Now we want to compare the number of 99 00:06:25,881 --> 00:06:30,311 crossing edges of this locally optimal cut to the total number of edges. 100 00:06:30,311 --> 00:06:35,063 So on the left hand side we're missing the crossing edges, so to complete that 101 00:06:35,063 --> 00:06:39,586 into all of the edges let's just add double the number of crossing edges to 102 00:06:39,586 --> 00:06:43,597 both sides of this inequality. On the left hand side, we get double of 103 00:06:43,597 --> 00:06:47,671 all of the edges and now on the right hand side we have 4 times the number of 104 00:06:47,671 --> 00:06:50,812 crossing edges. Dividing both sides of this inequality by 105 00:06:50,812 --> 00:06:54,718 4, we see we've proved the theorem. The number of edges that cross A,B is 106 00:06:54,718 --> 00:06:59,459 indeed a full 50% or more of the total number of edges in the graph. 107 00:06:59,459 --> 00:07:04,821 It's interesting to discuss briefly how the conclusions we've reached for a local 108 00:07:04,821 --> 00:07:09,722 search for a max cut extended to a natural weighted version of maximum cut. 109 00:07:09,722 --> 00:07:14,694 So which facts about the unweighted special case extended the weighted case 110 00:07:14,694 --> 00:07:18,802 and which facts do not? Well it still makes perfect sense to take a local 111 00:07:18,802 --> 00:07:22,351 search approach to the weighted version of the maximum cut problem. 112 00:07:22,351 --> 00:07:26,278 Now for each vertex in a given cut, you just look at the total weights of the 113 00:07:26,278 --> 00:07:30,333 incident edges that are crossing the cut and the total weights of the incident 114 00:07:30,333 --> 00:07:34,381 edges that are not crossing the cut. And whenever the weights of the edges not 115 00:07:34,381 --> 00:07:38,240 crossing the cut is strictly bigger, that's an opportunity to improve the 116 00:07:38,240 --> 00:07:41,242 current cut, by moving the vertex to the opposite side. 117 00:07:41,242 --> 00:07:45,411 One cool thing is that the performance guarantee that we just established of 50% 118 00:07:45,411 --> 00:07:49,459 for the output of local search, that carries over to the weighted case and the 119 00:07:49,459 --> 00:07:53,425 proof remains essentially the same and I'll leave it for you to check that in 120 00:07:53,425 --> 00:07:57,074 the privacy of your own home. Now, it's also true that in the weighted 121 00:07:57,074 --> 00:08:00,470 case a random cut still gets 50% of the total weight of the graph. 122 00:08:00,470 --> 00:08:04,947 So, perhaps this performance guarantee is not Nothing really to write home about. 123 00:08:04,947 --> 00:08:07,356 What breaks down is the running time analysis. 124 00:08:07,356 --> 00:08:09,821 Remember how this went for the unweighted case. 125 00:08:09,821 --> 00:08:13,550 We argued that there can be at most n choose 2 edges crossing any cut, and 126 00:08:13,550 --> 00:08:17,571 since every iteration of local search increases the number of crossing edges, 127 00:08:17,571 --> 00:08:20,422 it has to halt within a quadratic number of iterations. 128 00:08:20,422 --> 00:08:23,242 And that means it's easy to implement the algorithm. 129 00:08:23,242 --> 00:08:26,301 In polynomial time. One way to think about what's special 130 00:08:26,301 --> 00:08:30,363 about the unweighted version of the maximum cut problem is that even though, 131 00:08:30,363 --> 00:08:34,276 as we've seen, the graph has an exponential number of different cuts, all 132 00:08:34,276 --> 00:08:38,602 of those exponentially many cuts only take on a polynomial number of Different 133 00:08:38,602 --> 00:08:42,122 objective function values. The number of crossing edges is somewhere 134 00:08:42,122 --> 00:08:45,572 between 0, and n choose 2. By contrast, once the edges can just have 135 00:08:45,572 --> 00:08:49,297 any old weights, now, you can have an exponential number of cuts, with an 136 00:08:49,297 --> 00:08:53,527 exponential number, of distinct objective function values, of distinct values for 137 00:08:53,527 --> 00:08:56,552 their total weight. That means, just because your strictly 138 00:08:56,552 --> 00:09:00,718 improving the total weight crossing the cut in each iteration It does not imply 139 00:09:00,718 --> 00:09:04,238 that you're going to converge in a polynomial number of iterations. 140 00:09:04,238 --> 00:09:08,069 And in fact, it's a highly non-trivial exercise to prove that there are 141 00:09:08,069 --> 00:09:12,288 instances of weighted maximum cut, in which local search takes an exponential 142 00:09:12,288 --> 00:09:14,407 number of iterations before converging.