1
00:00:00,012 --> 00:00:04,146
I'll prove this performance guarantee for 
you on the next slide. 

2
00:00:04,146 --> 00:00:08,760
But let me first make a couple of 
comments. So first of all, the analysis 

3
00:00:08,760 --> 00:00:11,921
of this algorithm stated in the theorem 
is tight. 

4
00:00:11,921 --> 00:00:16,762
The 50% cannot be improved without making 
additional assumptions about the input. 

5
00:00:16,762 --> 00:00:21,115
Indeed, here's an example, which is 
itself, is even I bipartite graph. 

6
00:00:21,115 --> 00:00:24,045
So it's a tractable special case of 
maximum cut. 

7
00:00:24,045 --> 00:00:28,934
But even on bitartike graphs, the local 
searh heuristic might return to you a cut 

8
00:00:28,934 --> 00:00:33,302
which has merely 50% of the edges of a 
globally optimal, maximum cut. 

9
00:00:33,302 --> 00:00:38,202
The example is drawn here in green. 
There's just 4 verticles and 4 edges. 

10
00:00:38,202 --> 00:00:43,402
As I said it is a bi-parted graph, so the 
best part is to put U and W in one group 

11
00:00:43,402 --> 00:00:46,952
and V and X in one group. 
That cuts all 4 of the edges. 

12
00:00:46,952 --> 00:00:52,077
On the other hand, one possible output of 
the local search algorithm is the cut 

13
00:00:52,077 --> 00:00:56,774
where U and V are in 1 of the groups. 
A, and W and X are the other group B. 

14
00:00:56,774 --> 00:01:01,320
So this cut has only 2 crossing edges, 
only 50% of the maximum possible, yet it 

15
00:01:01,320 --> 00:01:04,806
is locally optimal. 
If you take any of the 4 vertices and you 

16
00:01:04,806 --> 00:01:09,313
switch it to the other group, you get a 
cut with 3 vertices on one side and then 

17
00:01:09,313 --> 00:01:13,987
the vertex by itself, since every vertex 
in this graph has degree 2, all of those 

18
00:01:13,987 --> 00:01:17,067
cuts are also only going to have 2 
crossing edges. 

19
00:01:17,067 --> 00:01:21,757
The second cautionary tale that I need to 
tell you is that you maybe shouldn't be 

20
00:01:21,757 --> 00:01:26,022
super impressed with the perofmance 
guarnatee of 50% for the maximum cut 

21
00:01:26,022 --> 00:01:28,922
problem. 
Indeed, even if I just took a uniform at 

22
00:01:28,922 --> 00:01:33,776
random cuts, by which I mean for each of 
the invertices I flip Independently a 

23
00:01:33,776 --> 00:01:36,870
fair coin. 
If the coin comes up heads, I put that 

24
00:01:36,870 --> 00:01:40,060
vertex in A. 
If the coin comes up tails, I put that 

25
00:01:40,060 --> 00:01:43,484
vertex in B. 
The expected number of edges that cross a 

26
00:01:43,484 --> 00:01:48,027
random cut chosen in this way, is already 
50% of the edges of the graph. 

27
00:01:48,027 --> 00:01:51,842
Just for kicks, let me sketch a quick 
proof of this fact. 

28
00:01:51,842 --> 00:01:55,835
Some of you may remember from part 1, I 
introduced a decomposition principle for 

29
00:01:55,835 --> 00:01:58,976
analyzing the expected value of 
complicated random variables. 

30
00:01:58,976 --> 00:02:02,911
You have a complicated random variable 
that you care about, what do you do, you 

31
00:02:02,911 --> 00:02:05,537
express it as the sum of indicator random 
variables. 

32
00:02:05,537 --> 00:02:08,422
Random variables that only take on the 
values of 0 and 1. 

33
00:02:08,422 --> 00:02:12,437
When you apply linearity of expectation 
that reduces computing the complicated 

34
00:02:12,437 --> 00:02:14,807
expectation, to a sum of simple 
expectations. 

35
00:02:14,807 --> 00:02:18,682
So it turns out that decomposition 
principal works beautifully to prove this 

36
00:02:18,682 --> 00:02:24,192
fact The complicated random variable that 
we care about is the number of edges that 

37
00:02:24,192 --> 00:02:28,118
cross a random cut. 
The constituents that are indicator 0 1 

38
00:02:28,118 --> 00:02:33,081
random variables are just whether or not 
a given edge crosses a random cut. 

39
00:02:33,081 --> 00:02:37,625
That is for an edge E of this input graph 
G, I'm going to define the random 

40
00:02:37,625 --> 00:02:42,285
variable X of E to be 1 if it's 2N. 
Points wind up in different groups, and 0 

41
00:02:42,285 --> 00:02:44,817
if its 2 end points wind up in the same 
group. 

42
00:02:44,817 --> 00:02:49,309
So what's the expected value of one of 
these indicator random variables x sub e? 

43
00:02:49,309 --> 00:02:53,478
Well, as with any indicator random 
variable, that's just the probability 

44
00:02:53,478 --> 00:02:57,462
that x of e=1, the probability that this 
edge e is cut by a random cut. 

45
00:02:57,462 --> 00:03:01,558
What's the chance of that? Well lets say 
the endpoints of edge E, are u and v. 

46
00:03:01,558 --> 00:03:05,646
There is 4 possibilities; u and v could 
both be in A, u and v could both be B, u 

47
00:03:05,646 --> 00:03:09,403
could be in A and v could be in B, or u 
could be in B, and v could be in A. 

48
00:03:09,403 --> 00:03:12,802
Each of those 4 outcomes is equally 
likely, probability of 1/4. 

49
00:03:12,802 --> 00:03:16,815
In the first 2 cases, this edge is not 
cut by the random cut, the 2 endpoints 

50
00:03:16,815 --> 00:03:20,082
are on the same side. 
In the other 2 cases, it is cut, they're 

51
00:03:20,082 --> 00:03:23,518
on different sides. 
So it's a 1/2 probability that this edge 

52
00:03:23,518 --> 00:03:27,440
is cut in a random cut, therefore the 
expected value of X sub B is 1/2. 

53
00:03:27,440 --> 00:03:31,299
And now we're good to go just by applying 
linearity of expectations. 

54
00:03:31,299 --> 00:03:35,903
So, precisely, what do you care about? We 
care about the expected number of edges 

55
00:03:35,903 --> 00:03:39,266
across a random cut. 
Well the number of edges crossing a cut 

56
00:03:39,266 --> 00:03:42,890
is just by definition the sum of the axes 
over all of the edges E. 

57
00:03:42,890 --> 00:03:46,114
X E is just whether or not. 
Now the given edge crosses the cut. 

58
00:03:46,114 --> 00:03:49,296
So the expected value of the random 
variable we care about the number of 

59
00:03:49,296 --> 00:03:51,896
crossing edges that's by linear 
[INAUDIBLE] expectation. 

60
00:03:51,896 --> 00:03:55,439
Just the sum of the expected values of 
these indicated random variables the X of 

61
00:03:55,439 --> 00:03:57,442
e's. 
Each of those has value one half, we're 

62
00:03:57,442 --> 00:04:00,628
summing up one for each of the edges. 
So the total of the sum is just the 

63
00:04:00,628 --> 00:04:03,027
number of edges divided by 2 and as 
claimed. 

64
00:04:03,027 --> 00:04:05,274
Thanks for indulging my little 
digression. 

65
00:04:05,274 --> 00:04:09,289
Again, the point of which is that just 
taking a random cut digital performance 

66
00:04:09,289 --> 00:04:11,642
guarantee of 50% for the maximum cut 
problem. 

67
00:04:11,642 --> 00:04:15,514
But, in the defense of local search, 
which also only gets a 50% performance 

68
00:04:15,514 --> 00:04:19,405
guarantee, it took a while and in fact 
you have to work pretty hard to do, to 

69
00:04:19,405 --> 00:04:24,763
get a performance guarantee better than 
50% with a polynomial time algorithm, for 

70
00:04:24,763 --> 00:04:28,306
the max cut problem. 
The most famous such algorithm is by 

71
00:04:28,306 --> 00:04:32,194
Gilmers and Williamson. 
That took until 1994 and it requires a 

72
00:04:32,194 --> 00:04:37,299
tool called semi-definite programming, 
something that's even more powerful than 

73
00:04:37,299 --> 00:04:40,861
linear programming. 
Let's now prove that local search is 

74
00:04:40,861 --> 00:04:45,782
guaranteed to output a cut whose number 
of crossing edges is at least half. 

75
00:04:45,782 --> 00:04:49,786
The total number of edges ion the graph. 
So pick your favorite locally optimal cut 

76
00:04:49,786 --> 00:04:52,022
A-B. 
And here by locally optimal, I just mean 

77
00:04:52,022 --> 00:04:56,079
something that the algorithm might return 
That is, a cut for which it is impossible 

78
00:04:56,079 --> 00:05:00,192
to swap a single vertex from one side to 
the other and improve the value of Of the 

79
00:05:00,192 --> 00:05:02,727
cut. 
By virtue of being locally optimal it 

80
00:05:02,727 --> 00:05:07,533
must be the case that furthest cut AB and 
for every single vertex of V, the number 

81
00:05:07,533 --> 00:05:12,314
of edges incident to this vertex that are 
crossing the cut is at least as large as 

82
00:05:12,314 --> 00:05:17,029
the number of vertices incident to this 
vertex that do not cross the cut, so the 

83
00:05:17,029 --> 00:05:21,004
previous notation that is C sub V. 
Is at least as big as D sub V. 

84
00:05:21,004 --> 00:05:23,931
If not, swapping V would give us a better 
cut. 

85
00:05:23,931 --> 00:05:28,357
So we have N of these inequalities, one 
is for each vertex V, so we can 

86
00:05:28,357 --> 00:05:32,370
legitimately sum up those inequalities 
combining them into one. 

87
00:05:32,370 --> 00:05:37,577
Now, let's focus first on the right hand 
side of this summed up inequality, so the 

88
00:05:37,577 --> 00:05:40,362
sum over all of the vertices in the 
graph. 

89
00:05:40,362 --> 00:05:44,985
Of the number of edges incident to the 
vertex that are crossing the cut. 

90
00:05:44,985 --> 00:05:50,187
Now here's the main point, what is this 
sum on the right hand side counting? It's 

91
00:05:50,187 --> 00:05:53,918
counting each edge that crosses the cut 
AB exactly twice. 

92
00:05:53,918 --> 00:05:57,483
Consider an edge, say U,W which is 
crossing the cut AB. 

93
00:05:57,483 --> 00:06:02,522
It gets counted twice in the right hand 
side, once when V=U and once when V=W. 

94
00:06:02,522 --> 00:06:06,927
We can apply exactly the same reasoning 
to the left hand side. 

95
00:06:06,927 --> 00:06:12,557
What is this I'm counting? It's counting 
each non-crossing edge exactly once. 

96
00:06:12,557 --> 00:06:17,652
Consider an edge against a u comma x 
who's both endpoints are on the same 

97
00:06:17,652 --> 00:06:20,862
side. 
That's going to be counted once in this 

98
00:06:20,862 --> 00:06:25,881
sum when v=u and then again when v=x. 
Now we want to compare the number of 

99
00:06:25,881 --> 00:06:30,311
crossing edges of this locally optimal 
cut to the total number of edges. 

100
00:06:30,311 --> 00:06:35,063
So on the left hand side we're missing 
the crossing edges, so to complete that 

101
00:06:35,063 --> 00:06:39,586
into all of the edges let's just add 
double the number of crossing edges to 

102
00:06:39,586 --> 00:06:43,597
both sides of this inequality. 
On the left hand side, we get double of 

103
00:06:43,597 --> 00:06:47,671
all of the edges and now on the right 
hand side we have 4 times the number of 

104
00:06:47,671 --> 00:06:50,812
crossing edges. 
Dividing both sides of this inequality by 

105
00:06:50,812 --> 00:06:54,718
4, we see we've proved the theorem. 
The number of edges that cross A,B is 

106
00:06:54,718 --> 00:06:59,459
indeed a full 50% or more of the total 
number of edges in the graph. 

107
00:06:59,459 --> 00:07:04,821
It's interesting to discuss briefly how 
the conclusions we've reached for a local 

108
00:07:04,821 --> 00:07:09,722
search for a max cut extended to a 
natural weighted version of maximum cut. 

109
00:07:09,722 --> 00:07:14,694
So which facts about the unweighted 
special case extended the weighted case 

110
00:07:14,694 --> 00:07:18,802
and which facts do not? Well it still 
makes perfect sense to take a local 

111
00:07:18,802 --> 00:07:22,351
search approach to the weighted version 
of the maximum cut problem. 

112
00:07:22,351 --> 00:07:26,278
Now for each vertex in a given cut, you 
just look at the total weights of the 

113
00:07:26,278 --> 00:07:30,333
incident edges that are crossing the cut 
and the total weights of the incident 

114
00:07:30,333 --> 00:07:34,381
edges that are not crossing the cut. 
And whenever the weights of the edges not 

115
00:07:34,381 --> 00:07:38,240
crossing the cut is strictly bigger, 
that's an opportunity to improve the 

116
00:07:38,240 --> 00:07:41,242
current cut, by moving the vertex to the 
opposite side. 

117
00:07:41,242 --> 00:07:45,411
One cool thing is that the performance 
guarantee that we just established of 50% 

118
00:07:45,411 --> 00:07:49,459
for the output of local search, that 
carries over to the weighted case and the 

119
00:07:49,459 --> 00:07:53,425
proof remains essentially the same and 
I'll leave it for you to check that in 

120
00:07:53,425 --> 00:07:57,074
the privacy of your own home. 
Now, it's also true that in the weighted 

121
00:07:57,074 --> 00:08:00,470
case a random cut still gets 50% of the 
total weight of the graph. 

122
00:08:00,470 --> 00:08:04,947
So, perhaps this performance guarantee is 
not Nothing really to write home about. 

123
00:08:04,947 --> 00:08:07,356
What breaks down is the running time 
analysis. 

124
00:08:07,356 --> 00:08:09,821
Remember how this went for the unweighted 
case. 

125
00:08:09,821 --> 00:08:13,550
We argued that there can be at most n 
choose 2 edges crossing any cut, and 

126
00:08:13,550 --> 00:08:17,571
since every iteration of local search 
increases the number of crossing edges, 

127
00:08:17,571 --> 00:08:20,422
it has to halt within a quadratic number 
of iterations. 

128
00:08:20,422 --> 00:08:23,242
And that means it's easy to implement the 
algorithm. 

129
00:08:23,242 --> 00:08:26,301
In polynomial time. 
One way to think about what's special 

130
00:08:26,301 --> 00:08:30,363
about the unweighted version of the 
maximum cut problem is that even though, 

131
00:08:30,363 --> 00:08:34,276
as we've seen, the graph has an 
exponential number of different cuts, all 

132
00:08:34,276 --> 00:08:38,602
of those exponentially many cuts only 
take on a polynomial number of Different 

133
00:08:38,602 --> 00:08:42,122
objective function values. 
The number of crossing edges is somewhere 

134
00:08:42,122 --> 00:08:45,572
between 0, and n choose 2. 
By contrast, once the edges can just have 

135
00:08:45,572 --> 00:08:49,297
any old weights, now, you can have an 
exponential number of cuts, with an 

136
00:08:49,297 --> 00:08:53,527
exponential number, of distinct objective 
function values, of distinct values for 

137
00:08:53,527 --> 00:08:56,552
their total weight. 
That means, just because your strictly 

138
00:08:56,552 --> 00:09:00,718
improving the total weight crossing the 
cut in each iteration It does not imply 

139
00:09:00,718 --> 00:09:04,238
that you're going to converge in a 
polynomial number of iterations. 

140
00:09:04,238 --> 00:09:08,069
And in fact, it's a highly non-trivial 
exercise to prove that there are 

141
00:09:08,069 --> 00:09:12,288
instances of weighted maximum cut, in 
which local search takes an exponential 

142
00:09:12,288 --> 00:09:14,407
number of iterations before converging.