I'll prove this performance guarantee for 
you on the next slide. 
But let me first make a couple of 
comments. So first of all, the analysis 
of this algorithm stated in the theorem 
is tight. 
The 50% cannot be improved without making 
additional assumptions about the input. 
Indeed, here's an example, which is 
itself, is even I bipartite graph. 
So it's a tractable special case of 
maximum cut. 
But even on bitartike graphs, the local 
searh heuristic might return to you a cut 
which has merely 50% of the edges of a 
globally optimal, maximum cut. 
The example is drawn here in green. 
There's just 4 verticles and 4 edges. 
As I said it is a bi-parted graph, so the 
best part is to put U and W in one group 
and V and X in one group. 
That cuts all 4 of the edges. 
On the other hand, one possible output of 
the local search algorithm is the cut 
where U and V are in 1 of the groups. 
A, and W and X are the other group B. 
So this cut has only 2 crossing edges, 
only 50% of the maximum possible, yet it 
is locally optimal. 
If you take any of the 4 vertices and you 
switch it to the other group, you get a 
cut with 3 vertices on one side and then 
the vertex by itself, since every vertex 
in this graph has degree 2, all of those 
cuts are also only going to have 2 
crossing edges. 
The second cautionary tale that I need to 
tell you is that you maybe shouldn't be 
super impressed with the perofmance 
guarnatee of 50% for the maximum cut 
problem. 
Indeed, even if I just took a uniform at 
random cuts, by which I mean for each of 
the invertices I flip Independently a 
fair coin. 
If the coin comes up heads, I put that 
vertex in A. 
If the coin comes up tails, I put that 
vertex in B. 
The expected number of edges that cross a 
random cut chosen in this way, is already 
50% of the edges of the graph. 
Just for kicks, let me sketch a quick 
proof of this fact. 
Some of you may remember from part 1, I 
introduced a decomposition principle for 
analyzing the expected value of 
complicated random variables. 
You have a complicated random variable 
that you care about, what do you do, you 
express it as the sum of indicator random 
variables. 
Random variables that only take on the 
values of 0 and 1. 
When you apply linearity of expectation 
that reduces computing the complicated 
expectation, to a sum of simple 
expectations. 
So it turns out that decomposition 
principal works beautifully to prove this 
fact The complicated random variable that 
we care about is the number of edges that 
cross a random cut. 
The constituents that are indicator 0 1 
random variables are just whether or not 
a given edge crosses a random cut. 
That is for an edge E of this input graph 
G, I'm going to define the random 
variable X of E to be 1 if it's 2N. 
Points wind up in different groups, and 0 
if its 2 end points wind up in the same 
group. 
So what's the expected value of one of 
these indicator random variables x sub e? 
Well, as with any indicator random 
variable, that's just the probability 
that x of e=1, the probability that this 
edge e is cut by a random cut. 
What's the chance of that? Well lets say 
the endpoints of edge E, are u and v. 
There is 4 possibilities; u and v could 
both be in A, u and v could both be B, u 
could be in A and v could be in B, or u 
could be in B, and v could be in A. 
Each of those 4 outcomes is equally 
likely, probability of 1/4. 
In the first 2 cases, this edge is not 
cut by the random cut, the 2 endpoints 
are on the same side. 
In the other 2 cases, it is cut, they're 
on different sides. 
So it's a 1/2 probability that this edge 
is cut in a random cut, therefore the 
expected value of X sub B is 1/2. 
And now we're good to go just by applying 
linearity of expectations. 
So, precisely, what do you care about? We 
care about the expected number of edges 
across a random cut. 
Well the number of edges crossing a cut 
is just by definition the sum of the axes 
over all of the edges E. 
X E is just whether or not. 
Now the given edge crosses the cut. 
So the expected value of the random 
variable we care about the number of 
crossing edges that's by linear 
[INAUDIBLE] expectation. 
Just the sum of the expected values of 
these indicated random variables the X of 
e's. 
Each of those has value one half, we're 
summing up one for each of the edges. 
So the total of the sum is just the 
number of edges divided by 2 and as 
claimed. 
Thanks for indulging my little 
digression. 
Again, the point of which is that just 
taking a random cut digital performance 
guarantee of 50% for the maximum cut 
problem. 
But, in the defense of local search, 
which also only gets a 50% performance 
guarantee, it took a while and in fact 
you have to work pretty hard to do, to 
get a performance guarantee better than 
50% with a polynomial time algorithm, for 
the max cut problem. 
The most famous such algorithm is by 
Gilmers and Williamson. 
That took until 1994 and it requires a 
tool called semi-definite programming, 
something that's even more powerful than 
linear programming. 
Let's now prove that local search is 
guaranteed to output a cut whose number 
of crossing edges is at least half. 
The total number of edges ion the graph. 
So pick your favorite locally optimal cut 
A-B. 
And here by locally optimal, I just mean 
something that the algorithm might return 
That is, a cut for which it is impossible 
to swap a single vertex from one side to 
the other and improve the value of Of the 
cut. 
By virtue of being locally optimal it 
must be the case that furthest cut AB and 
for every single vertex of V, the number 
of edges incident to this vertex that are 
crossing the cut is at least as large as 
the number of vertices incident to this 
vertex that do not cross the cut, so the 
previous notation that is C sub V. 
Is at least as big as D sub V. 
If not, swapping V would give us a better 
cut. 
So we have N of these inequalities, one 
is for each vertex V, so we can 
legitimately sum up those inequalities 
combining them into one. 
Now, let's focus first on the right hand 
side of this summed up inequality, so the 
sum over all of the vertices in the 
graph. 
Of the number of edges incident to the 
vertex that are crossing the cut. 
Now here's the main point, what is this 
sum on the right hand side counting? It's 
counting each edge that crosses the cut 
AB exactly twice. 
Consider an edge, say U,W which is 
crossing the cut AB. 
It gets counted twice in the right hand 
side, once when V=U and once when V=W. 
We can apply exactly the same reasoning 
to the left hand side. 
What is this I'm counting? It's counting 
each non-crossing edge exactly once. 
Consider an edge against a u comma x 
who's both endpoints are on the same 
side. 
That's going to be counted once in this 
sum when v=u and then again when v=x. 
Now we want to compare the number of 
crossing edges of this locally optimal 
cut to the total number of edges. 
So on the left hand side we're missing 
the crossing edges, so to complete that 
into all of the edges let's just add 
double the number of crossing edges to 
both sides of this inequality. 
On the left hand side, we get double of 
all of the edges and now on the right 
hand side we have 4 times the number of 
crossing edges. 
Dividing both sides of this inequality by 
4, we see we've proved the theorem. 
The number of edges that cross A,B is 
indeed a full 50% or more of the total 
number of edges in the graph. 
It's interesting to discuss briefly how 
the conclusions we've reached for a local 
search for a max cut extended to a 
natural weighted version of maximum cut. 
So which facts about the unweighted 
special case extended the weighted case 
and which facts do not? Well it still 
makes perfect sense to take a local 
search approach to the weighted version 
of the maximum cut problem. 
Now for each vertex in a given cut, you 
just look at the total weights of the 
incident edges that are crossing the cut 
and the total weights of the incident 
edges that are not crossing the cut. 
And whenever the weights of the edges not 
crossing the cut is strictly bigger, 
that's an opportunity to improve the 
current cut, by moving the vertex to the 
opposite side. 
One cool thing is that the performance 
guarantee that we just established of 50% 
for the output of local search, that 
carries over to the weighted case and the 
proof remains essentially the same and 
I'll leave it for you to check that in 
the privacy of your own home. 
Now, it's also true that in the weighted 
case a random cut still gets 50% of the 
total weight of the graph. 
So, perhaps this performance guarantee is 
not Nothing really to write home about. 
What breaks down is the running time 
analysis. 
Remember how this went for the unweighted 
case. 
We argued that there can be at most n 
choose 2 edges crossing any cut, and 
since every iteration of local search 
increases the number of crossing edges, 
it has to halt within a quadratic number 
of iterations. 
And that means it's easy to implement the 
algorithm. 
In polynomial time. 
One way to think about what's special 
about the unweighted version of the 
maximum cut problem is that even though, 
as we've seen, the graph has an 
exponential number of different cuts, all 
of those exponentially many cuts only 
take on a polynomial number of Different 
objective function values. 
The number of crossing edges is somewhere 
between 0, and n choose 2. 
By contrast, once the edges can just have 
any old weights, now, you can have an 
exponential number of cuts, with an 
exponential number, of distinct objective 
function values, of distinct values for 
their total weight. 
That means, just because your strictly 
improving the total weight crossing the 
cut in each iteration It does not imply 
that you're going to converge in a 
polynomial number of iterations. 
And in fact, it's a highly non-trivial 
exercise to prove that there are 
instances of weighted maximum cut, in 
which local search takes an exponential 
number of iterations before converging.