I'll prove this performance guarantee for you on the next slide. But let me first make a couple of comments. So first of all, the analysis of this algorithm stated in the theorem is tight. The 50% cannot be improved without making additional assumptions about the input. Indeed, here's an example, which is itself, is even I bipartite graph. So it's a tractable special case of maximum cut. But even on bitartike graphs, the local searh heuristic might return to you a cut which has merely 50% of the edges of a globally optimal, maximum cut. The example is drawn here in green. There's just 4 verticles and 4 edges. As I said it is a bi-parted graph, so the best part is to put U and W in one group and V and X in one group. That cuts all 4 of the edges. On the other hand, one possible output of the local search algorithm is the cut where U and V are in 1 of the groups. A, and W and X are the other group B. So this cut has only 2 crossing edges, only 50% of the maximum possible, yet it is locally optimal. If you take any of the 4 vertices and you switch it to the other group, you get a cut with 3 vertices on one side and then the vertex by itself, since every vertex in this graph has degree 2, all of those cuts are also only going to have 2 crossing edges. The second cautionary tale that I need to tell you is that you maybe shouldn't be super impressed with the perofmance guarnatee of 50% for the maximum cut problem. Indeed, even if I just took a uniform at random cuts, by which I mean for each of the invertices I flip Independently a fair coin. If the coin comes up heads, I put that vertex in A. If the coin comes up tails, I put that vertex in B. The expected number of edges that cross a random cut chosen in this way, is already 50% of the edges of the graph. Just for kicks, let me sketch a quick proof of this fact. Some of you may remember from part 1, I introduced a decomposition principle for analyzing the expected value of complicated random variables. You have a complicated random variable that you care about, what do you do, you express it as the sum of indicator random variables. Random variables that only take on the values of 0 and 1. When you apply linearity of expectation that reduces computing the complicated expectation, to a sum of simple expectations. So it turns out that decomposition principal works beautifully to prove this fact The complicated random variable that we care about is the number of edges that cross a random cut. The constituents that are indicator 0 1 random variables are just whether or not a given edge crosses a random cut. That is for an edge E of this input graph G, I'm going to define the random variable X of E to be 1 if it's 2N. Points wind up in different groups, and 0 if its 2 end points wind up in the same group. So what's the expected value of one of these indicator random variables x sub e? Well, as with any indicator random variable, that's just the probability that x of e=1, the probability that this edge e is cut by a random cut. What's the chance of that? Well lets say the endpoints of edge E, are u and v. There is 4 possibilities; u and v could both be in A, u and v could both be B, u could be in A and v could be in B, or u could be in B, and v could be in A. Each of those 4 outcomes is equally likely, probability of 1/4. In the first 2 cases, this edge is not cut by the random cut, the 2 endpoints are on the same side. In the other 2 cases, it is cut, they're on different sides. So it's a 1/2 probability that this edge is cut in a random cut, therefore the expected value of X sub B is 1/2. And now we're good to go just by applying linearity of expectations. So, precisely, what do you care about? We care about the expected number of edges across a random cut. Well the number of edges crossing a cut is just by definition the sum of the axes over all of the edges E. X E is just whether or not. Now the given edge crosses the cut. So the expected value of the random variable we care about the number of crossing edges that's by linear [INAUDIBLE] expectation. Just the sum of the expected values of these indicated random variables the X of e's. Each of those has value one half, we're summing up one for each of the edges. So the total of the sum is just the number of edges divided by 2 and as claimed. Thanks for indulging my little digression. Again, the point of which is that just taking a random cut digital performance guarantee of 50% for the maximum cut problem. But, in the defense of local search, which also only gets a 50% performance guarantee, it took a while and in fact you have to work pretty hard to do, to get a performance guarantee better than 50% with a polynomial time algorithm, for the max cut problem. The most famous such algorithm is by Gilmers and Williamson. That took until 1994 and it requires a tool called semi-definite programming, something that's even more powerful than linear programming. Let's now prove that local search is guaranteed to output a cut whose number of crossing edges is at least half. The total number of edges ion the graph. So pick your favorite locally optimal cut A-B. And here by locally optimal, I just mean something that the algorithm might return That is, a cut for which it is impossible to swap a single vertex from one side to the other and improve the value of Of the cut. By virtue of being locally optimal it must be the case that furthest cut AB and for every single vertex of V, the number of edges incident to this vertex that are crossing the cut is at least as large as the number of vertices incident to this vertex that do not cross the cut, so the previous notation that is C sub V. Is at least as big as D sub V. If not, swapping V would give us a better cut. So we have N of these inequalities, one is for each vertex V, so we can legitimately sum up those inequalities combining them into one. Now, let's focus first on the right hand side of this summed up inequality, so the sum over all of the vertices in the graph. Of the number of edges incident to the vertex that are crossing the cut. Now here's the main point, what is this sum on the right hand side counting? It's counting each edge that crosses the cut AB exactly twice. Consider an edge, say U,W which is crossing the cut AB. It gets counted twice in the right hand side, once when V=U and once when V=W. We can apply exactly the same reasoning to the left hand side. What is this I'm counting? It's counting each non-crossing edge exactly once. Consider an edge against a u comma x who's both endpoints are on the same side. That's going to be counted once in this sum when v=u and then again when v=x. Now we want to compare the number of crossing edges of this locally optimal cut to the total number of edges. So on the left hand side we're missing the crossing edges, so to complete that into all of the edges let's just add double the number of crossing edges to both sides of this inequality. On the left hand side, we get double of all of the edges and now on the right hand side we have 4 times the number of crossing edges. Dividing both sides of this inequality by 4, we see we've proved the theorem. The number of edges that cross A,B is indeed a full 50% or more of the total number of edges in the graph. It's interesting to discuss briefly how the conclusions we've reached for a local search for a max cut extended to a natural weighted version of maximum cut. So which facts about the unweighted special case extended the weighted case and which facts do not? Well it still makes perfect sense to take a local search approach to the weighted version of the maximum cut problem. Now for each vertex in a given cut, you just look at the total weights of the incident edges that are crossing the cut and the total weights of the incident edges that are not crossing the cut. And whenever the weights of the edges not crossing the cut is strictly bigger, that's an opportunity to improve the current cut, by moving the vertex to the opposite side. One cool thing is that the performance guarantee that we just established of 50% for the output of local search, that carries over to the weighted case and the proof remains essentially the same and I'll leave it for you to check that in the privacy of your own home. Now, it's also true that in the weighted case a random cut still gets 50% of the total weight of the graph. So, perhaps this performance guarantee is not Nothing really to write home about. What breaks down is the running time analysis. Remember how this went for the unweighted case. We argued that there can be at most n choose 2 edges crossing any cut, and since every iteration of local search increases the number of crossing edges, it has to halt within a quadratic number of iterations. And that means it's easy to implement the algorithm. In polynomial time. One way to think about what's special about the unweighted version of the maximum cut problem is that even though, as we've seen, the graph has an exponential number of different cuts, all of those exponentially many cuts only take on a polynomial number of Different objective function values. The number of crossing edges is somewhere between 0, and n choose 2. By contrast, once the edges can just have any old weights, now, you can have an exponential number of cuts, with an exponential number, of distinct objective function values, of distinct values for their total weight. That means, just because your strictly improving the total weight crossing the cut in each iteration It does not imply that you're going to converge in a polynomial number of iterations. And in fact, it's a highly non-trivial exercise to prove that there are instances of weighted maximum cut, in which local search takes an exponential number of iterations before converging.