So in the last video I left you with a
cliffhanger. I introduced you to the
minimum cut problem. I introduced you to a
very simple algorithm, randomized
algorithm, in the form of contraction
algorithm. We observed that sometimes it
finds the main cut and sometimes it
doesn't. And so the $64000 question is
just how frequently does it succeed and
just how frequently does it fail. So now
that I hope you've brushed up the
conditional probability and independence,
we are gonna give a very precise answer to
that question in this lecture. >>So
recalling this problem we are given as
input in undirected graph, possibly with
parallel edges, and that the goal is to
compute among the exponential number of
possible different cuts, that's with the
fewest number of crossing edges. So, for
example in this graph here, which you've
seen in a previous video, the goal is to
compute the cut (A, B). Here, cuz there are
only two crossing edges, and that's as
small as it gets. That's the minimum cut
problem and Karger proposed the following
random contraction algorithm based on
random sampling, so we have N-2
iterations, and the number of vertices
gets decremented by 1 in each iteration.
So we start with N vertices, we get down
to 2. And how do we decrease the number
of vertices? We do it by contracting or
fusing two vertices together. How do we
pick which pair of edges, which pair of
vertices to fuse? Well we pick one of the
remaining edges, uniformly at random. So
there's [inaudible] many edges there are
remaining. We pick each one, equally
likely. What, if the endpoints of that
edge are (u, v), then we collapse u and v
together into a single super node. So
that's what we mean by contracting two
nodes into a single vertex and then if
that causes any self-loops, and as we saw
the examples, we will in general have
self-loops, then we delete them before
proceeding. After the N-2
generations, only two vertices remain.
You'll recall that two vertices
naturally correspond to a cut. The first
group of the cut A corresponds to the
vertices that were fused into one of the
super vertices remaining at the end. The
other super vertex corresponds to the set
B the other original vertices of the
graph. >>So the goal of this lec, of this
video is to give an answer to the
following question: What is the
probability of success? Where by success,
we mean outputs a particular min cut (A, B).
So let's set up the basic notation. We're
gonna fix any with input graph, undirected
graph. As usual we use N to denote the
number of vertices and M to denote the
number of edges. We're also going to fix a
minimum cuts (A, B). If a graph has
only one minimum cut, then it's clear what
I'm talking about here. If a graph has
multiple minimum cuts, I'm actually
selecting just one of them. Because I'm gonna
focus on a distinguished minimum cut (A, B),
and we're only gonna define the
algorithm as successful if it outputs this
particular minimum cut (A, B). If it outputs
some other minimum cut, we don't count it.
We don't count it as successful. Okay. So,
we really want this distinguished minimum
cut (A, B). In addition to N and M,
we're gonna have a parameter K, which is
the size of the minimum cut. That is, it's
the number of crossing edges of a minimum
cut. For example, that cross (A, B). The K
edges that cross the minimum cut (A, B); we're
going to call capital F. So the picture
you wanna have in mind is, there is, out
there in the world, this minimum cut (A, B).
There's lots of edges with both end points
in A, lots of edges possibly with both
endpoints in B. But, there's not a whole
lot of edges with one endpoint in A and
one in endpoint in B. So the edges F,
would be precisely, these three crossing
edges here. >>So our next step is to get a
very clear understanding of exactly when
the execution of the contraction algorithm
can go wrong, and exactly when it's gonna
work, exactly when we're going to succeed.
So let me redraw the same picture from the
previous slide. So given they were hoping
that the contraction algorithm outputs
this cut (A, B) at the end of the day, what
could possibly go wrong? Well, to see what
could go wrong, suppose,, at some iteration,
one of the edges in capital F, remember F
are the edges crossing the min cut (A, B), so
it's these three magenta edges in the
picture. Suppose at some iteration one of
the edges of F gets chosen for
contraction. Well because this edge of F
has one endpoint in A and one endpoint in
B, when it gets chosen for contraction, it
causes this node from A and this node from
B to be fused together. What does that
mean? That means, in the cut that the
contraction algorithm finally outputs, this
node from A and this node from B will be
on the same side of the output cut. Okay,
so the cut output by the contraction
algorithm will have on one side both the
node from A and the node from B.
Therefore, the output of the contraction
algorithm if this happens will be a
different cut than (A, B), okay? It will not
output (A, B) if some edge of F is contracted.
>>And if you think about it, the converse is
also true. So let's assume now, that in
each of the N-2 iterations, the contraction
algorithm never contracts an edge from
capital F. Remember capital F are exactly
the edges with one endpoint in A and one
endpoint in B. So if it never contracts
any edge of F, then it only contracts
edges where both endpoints lie in capital
A or both endpoints lie in capital B.
Well, if this is the case then, vertices
from A always stick together in the fused
nodes, and vertices from B always stick
together in the fused nodes. There is
never A iteration where a node from A and
a node from B are fused together. What
does that mean? That means that when the
algorithm outputs cuts all of the
nodes in A have been grouped together, all
of the nodes in B have been grouped
together, in each of the two super nodes,
which means that the output of the
algorithm is indeed the desired
cut (A, B). Summarizing, the
contraction algorithm will do what we
want. It will succeed and output the cut
(A, B), if and only if it never chooses an
edge from capital F for contraction.
Therefore, the probability of success,
that is, the probability that the output
is the distinguished min cut (A, B), is
exactly the probability that never
contracts an edge of capital F. >>So, this
is what we're gonna be interested in here.
This really is the object of our mathematical
analysis, the probability that in all of
the N-2 iterations we never
contact an edge of capital F. So, to think
about that, let's think about each
iteration in isolation, and actually define
some events describing that. So for an
iteration I, let Si denote the event, that
we screw up an iteration I. With this
notation, we can succinctly say what our
goal is, so, to compute the probability of
success. What we wanna do is we wanna
compute the probability that none of the
events, S1, S2 up to N minus, S(N-2)
never occur. So, I'm gonna use this NOT(¬)
symbol to say that S1 does not happen. So
we don't screw up in iteration 1, we
don't screw up in iteration 2, we don't
screw up in iteration 3, and so on. All
the way up to, we don't screw up, we don't
contract anything from capital F, in the
final iteration, either. So summarizing,
analyzing the success probability of the
contraction algorithm boils down to
analyzing the probability of this event,
the intersection of the NOT Sis
with I ranging from iteration 1 to
iteration N-2. >>So we're gonna take
this in baby steps, and the next quiz will
lead you through the first one, which is,
let's have a more modest goal. Let's just
think about iteration 1. Let's
try and understand, what's the chance we
screw up, what's the chance we don't screw
up, just in the first iteration? So the
answer to this quiz is the second option.
The probability is K/M, where K is the
number edges crossing the cut (A, B),
and M is the total number of edges. And
that's just because the probability of S1, the probability we screw up, is just the
number of crossing edges. That's the
number of outcomes which are bad, which
cause which trigger S1, divided by the
number of edges. That's the total number
of things that could happen. And since all
edges are equally likely, it just boils
down to this. And by the definition of our
notation, this is exactly K/M. So this
gives us an exact calculation of the
failure probability in the first
iteration, as a function of the number of
crossing edges, and the number of overall
edges. Now, it turns out it's gonna be
more useful for us to have a bound not
quite as exact, an inequality. That's in
terms of the number of vertices N, rather
than the number of edges, M. The reason
for that is, it's a little hard to
understand how the number of edges is
changing in each iteration. It's certainly
going down by 1 in each iteration, because we
contract that in each iteration, but it
might go down by more than 1 when we
delete self-loops. By contrast the number
of vertices is this very steady obvious
process. One less vertex with each
successive iteration. >>So, let's rewrite
this bound in terms of the number of
vertices N. To do that in a useful way,
we make the following key observation. I
claim that, in the original graph G,
we are given as input, every vertex has at
least K edges incident on it, that is in
graph theoretic terminology, every edge
has degree at least K. Where, recall, K is
the number of edges crossing our favorite
min cut (A, B). So why is that true? Why
must every vertex have a decent number of
neighbors, a decent number of edges
incident to it. Well, it's because, if you
think about it, each vertex defines a cut
by itself. Remember, a cut is just any
grouping into other vertices into two
groups, that are not empty, that don't
overlap. So one cut is to take a single
vertex, and make that the first group, A,
and take the other N-1 vertices,
and make that the second group, capital B.
So how many edges cross this cut? Well,
it's exactly the edges that are incident
on the first note, on the note on the left
side. So every single cut, fall
exponentially many cuts, have at least
K crossing edges, then certainly the
N cuts defined by single vertices have at
least K crossing edges, so therefore, the
degree of a vertex is at least K. So
our assumption that every single cut in
the graph has at least K crossing edges
because it's a lower bound on the number
edges incident on each possible vertex.
>>So, why is that usual? Well let's recall
the following general facts about any graph; which
is that if you sum up over the degrees of
the nodes, so if you go node by node, look
at how many edges are insident on that
node, that's the degree of V, and then sum
them up over all vertices. What will you
get? You'll get exactly twice the number of
edges, okay? So this is true for any
undirected graph, that the sum of the
degrees of the vertices is exactly double-
the number of edges. To see this, you
might think about taking a graph, starting
with the empty set of edges, and then
adding the edges of the graph one at a
time. Each time you add a new edge to a
graph, obviously the number of edges goes
up by 1, and the degree of each of the
endpoints of that edge also go up by 1,
and there are, of course, two endpoints.
So every time you add an edge, the number
of edges goes up by 1, the sum of those
degrees goes up by 2. Therefore, when
you've added all the edges, the sum of the
degrees is double the number of edges that
you've added. That's why this is true.
Now, in this graph, at that we have a hand here,
every degree is at least K, and there's N
nodes. So this left hand side, of course,
is at least KN for us. So therefore if we
just divide through by 2, and flip the
inequality around, we have the number of
edges has to be at least the size of the
crossing cut, so the degrees of every
vertex times the number of vertices
divided by 2. So this is just the
primitive inequality rearranging. Putting
this together with your answer on the quiz,
since the probability of S1 is exactly K/M,
and M is at least KN/2, if
we substitute, we get that the probability
of S1 is at worst 2/N, 2 over
the number of vertices, and the K cancels out.
So that's, sort of, our first
milestone. We've figured out the chance
that we screw up in the first iteration,
that we pick some edge from the
crosses the cut (A, B). And things look good.
This is a, this is a small number, right?
So, in general, the number of vertices
might be quite big. And this says that
the probability we screw up is only 2
over the number of vertices. So, so far,
so good. Of course, this was only the
first iteration. Who knows what happens
later? >>So now that we understand the
chances of screwing up in the first
iteration, let's take our next baby step,
and understand the probability that we
don't screw up in either of the first two
iterations. That is, we're gonna be
interested. And the following probability.
The probability that we don't screw up in
the first iteration nor in the second
iteration. Now, as you go back to the
definition of a conditional probability,
to realize that we can rewrite an
intersection like this in terms of
conditional probabilities. Namely, as the
probability that we don't screw up in the
second iteration, given that we didn't do
it already, times the probability that we
didn't screw up in the first iteration.
Okay? So the probability that we miss all of
these K vulnerable edges and in the second
iteration given that we didn't contract
any of them in the first iteration. Now
notice this, we already have a good
understanding on the previous slide. We are
given a nice lower bound of this. We say
there's a good chance that we don't screw
up, probably at least 1-2/N. And in some sense we also have a
very good understanding of this
probability. We know this is 1 minus the
chance that we do screw up. And what's the
chance that we do screw up? Well, these K
edges are still hanging out in the graph.
Remember we didn't contract any, in the
first iteration that's what's given. So
there are K ways to screw up, and we choose
an edge to contract uniformly at random, so
the total number of choices is the number
of remaining edges. >>Now the problem is,
what's nice is we have an exact
understanding of this probability. This is
an equality. The problem is we don't have
a good understanding of this denominator.
How many remaining edges are there? We
have an upper bound on this. We know this
is at most N-1, assuming we got
rid of one edge in the previous iteration,
but actually what, if you think about it,
what we need of this quantity is a lower
bound and that's a little unclear because
in addition to contracting the one edge
and getting that out of the graph, we
might have created a bunch of self loops
and deleted all events. So it's hard to
understand exactly what this quantity is.
So instead we're gonna rewrite this bound
in terms of the numbers of the remaining
vertices, and of course we know it's
exactly N-1 vertices remaining. We
took two of the last iterations and
contracted down to 1. So how do we
relate the number of edges to the number of
vertices? Well we do it just in exactly
the same way as in the first iteration. We'll
make some more general observation. In
the first iteration, we observed that every
node in the original graph induces a cut.
Okay, with that node was on one side, the
other N-1 edges were on the other
side. But the fact that's a more general
statement, even after we've done a bunch
of contractions, any single node in the
contracted graph, even if it represents a
union of a bunch of nodes in the original
graph, we can still think of that as a cut
in the original graph. Right? So if
there's some super node in the contracted
graph, which is the result of fusing
twelve different things together, that
corresponds to a cut where those twelve
nodes in the original graph are on the one
side A, and the other N-12
vertices are on the other side of the cut,
B. So, even after contractions, as long as
we have at least two nodes in our
contracted graph, you can take any node
and think of it as half of a cut, one side
of a cut in the original graph.
>>Now remember, K is the number of edges
crossing our minimum cut (A, B), so
any cuts in the original graph G has to
have K crossing edges. So, since every
node in the contracted graph naturally
maps over to a cut in the original graph
with at least K edges crossing it, that
means, in the contracted graph, all of the
degrees have to be at least K. If you ever
had a node in the contracted graph that
had only say K-1 incident edges,
well then you'd have a cut in the original
graph with only K-1 edges
contradiction. So just like in the first
iteration, now that we have a lower bound
on the degree of every single vertex, we
can derive a lower bound on the number of
edges that remain in the graph. The number
of remaining edges is at least 1/2,
that's because when you sum over the
degrees of the vertices, you double count
the edges, times the degree of each
vertex, that we just argued that that's at
least K in this contracted graph, times
the number of vertices, that we know
there's exactly N-1 vertices left in
the graph at this point. So now what we do
is to plug this inequality, to plug this
lower bound of the number of remaining
edges, on, as we'll substitute that for
this denominator, so in lower bounding the
denominator, we upper bound this fraction,
which gives us a lower bound on 1 minus
that fraction, and that's what we want. So
what we find is that the probability that
we don't screw up in the second iteration
given that we didn't screw up in the first
iteration. Where again, by screwing up
means picking one of these K edges
crossing (A, B) to contract is at least
1-(2/(N-1)). So, that's pretty cool. We
took the first iteration, we analyzed it,
we showed the probability that we screw up
is pretty low, we succeed with probability of
at least 1-(2/N). In the
second iteration, our success probability
has dropped a little bit, but it's still
looking pretty good for reasonable values
of N, 1-(2/(N-1)). >>Now,
as I hope you've picked up, we can
generalize this pattern to any number of
iterations, so that the degree of every node of
the contracted graph remains at least K.
The only thing which is changing is the
number of vertices is dropping by 1. So,
extending this pattern to its logical
conclusion, we get the following lower
bound on the probability that the
contraction algorithm succeeds. The
probability that the contraction algorithm
outputs the cut (A, B), you recall we argued,
is exactly the same thing as the
probability that it doesn't contract
anything, any of the K crossing edges, any
of the set F in the first iteration, nor in
the second iteration, nor in the third
iteration, and then so on, all the way up to
the final (N-2)th iteration. Using the
definition of conditional probability,
this is just the probability that we
don't screw up in the first iteration,
times the probability that we don't screw
up in the second iteration given that we
didn't screw up in the first iteration,
and so on. In the previous two slides, we
showed that, we don't screw up in the
first iteration, with probability of at
least 1-(2/N). In the second
iteration, with probability at least 1-(2/(N-1)). And of course,
you can guess what that pattern looks
like. And that results in the following
product. Now, because we stop when we get
down to two nodes remaining, the last
iteration in which we actually make a
contraction, there are three nodes. And
then, the second to last iteration in
which we make a contraction, there are
four nodes. So that's where these last two
terms come from. Rewriting, this is just
(N-2)/N times (N-3)/(N-1), and so on. And now something
very cool happens, which is massive
cancellation, and to this day, this is
always just incredibly satisfying to be
able to cross out so many terms. So you
get N-2, cross it out here and now here,
there's going to be a pair of N-3s that
get crossed out, and N-4s, and so on. On the
other side, there's going to be a pair of
4s that get crossed out, and a pair of 3s
that get crossed out. And we'll be left
with only the two largest terms on the
denominator, and the two smallest terms in
the numerator, which is exactly 2/N(N-1). And to keep things
simple among friends, let's just be sloppy
and lower bound this by 1/(N^2).
So that's it. That's our analysis
of the success probability of Karger's
contraction algorithm. Pretty cool, pretty
slick, huh? >>Okay, I'll concede, probably
you're thinking "Hey, wait a minute. We're
analyzing the probability that the
algorithm succeeds, and we're thinking of
the number of vertices N as being big, so
we'll see here as a success probability of
only 1/(N^2), and that kinda sucks."
So that's a good point. Let me
address that problem. This is a low
success probability. So that's
disappointing. So why are we talking about
this algorithm, or this analysis? Well,
here's something I want to point out.
Maybe this is not so good, 1/(N^2) you're going to succeed, but
this is still actually shockingly high for
an brute-forth algorithm which honestly
seems to be doing almost nothing. This is
a nontrivial lower bound and non trivial
success probability, because don't forget,
there's an exponential number of cuts in
the graph. So if you try to just pick a
random cut i.e you put every vertex 50:50
left or right, you'll be doing way worse
than this. You'll have a success
probability of like 1/(2^N).
So this is way, way better than that. And
the fact that its inverse polynomial
means is that using repeated trials, we can
turn a success probability that's
incredibly small into a failure
probability that's incredibly small. So
lemme show you how to do that next. >>So,
we're gonna boost the success probability
of the contraction algorithm in, if you
think about it a totally obvious way.
We're gonna run it a bunch of times, each
one independently using a fresh batch of
random coins. And we're just going to
remember the smallest cut that we ever see,
and that's what we're gonna return, the
best of a bunch of repeated trials. Now
the question is, how many trials are we
gonna need before we're pretty confident
that we actually find the meant cut that we're
looking for? To answer this question
vigorously, let's define some suitable
events. So by Ti, I mean the event at the
Ith trail succeeds, that is the Ith
time we run the contraction
algorithm which does output that desired meant
cut (A, B). For those of you that watched
the part II of the probability review, I
said a rule of thumb for dealing with
independents is that, you should maybe, as
a working hypothesis, assume granted
variables are dependent, unless they're
explicitly constructed to be independent.
So here's a case where we're just gonna
define the random variables to be
independent. We're just gonna say that we
run [inaudible] the contraction algorithm
over and over again with fresh randomness
so that they're gonna be independent trials.
Now, we know that the, probability that a
single trial fails can be pretty big, could be as big as 1-1/(N^2). But,
here, now, with repeated trials, we're
only in trouble if every single trial
fails. If even one succeeds, then we find
the meant cut. So a different way of saying
that is we're interested in the
intersection of T1 and T2 and so on,
that's the event that every single trial
fails. And now we use the fact that the
trials are independent. So, the
probability that all of these things
happen is just the product of the relevant
probabilities. So, the product from I=1
to capital N of the probability of
not TI. Recall that we argued that the
success probability of a single trial was
bounded below by 1/(N^2). So
the failure probability is bounded above
by 1-1/(N^2). So since
that's true for each of the capital N
terms, you get an overall failure
probability for all capital N trials of
1 minus 1/(n^2) raised
to the capital of N. Alright, so that's a
little calculation. Don't lose sight of
why we're doing the calculation. We want
to answer this question, how many trials
do we need? How big does capital N need to
be before are confident that we get the
answer that we want? >>Okay, so to answer that
question I need a quick calculus fact,
which is both very simple and very useful.
So for all real numbers X, we have the
following inequality, 1+x is bound
above by e^x. So I'll just
give you a quick proof via picture. So
first think about the line 1+x. What
does that cross through? Well, that
crosses through the points when x is -1,
y is 0, and when x is 0, y is 1.
And it's a line, so this looks like this
blue line here. What does e^x look
like? Well, if you substitute x = 0,
it's gonna be 1. So in fact two curves
kiss each other at x = 0. But
exponentials grow really quickly, so as you
jack up x to higher positive numbers, it
becomes very, very steep. And for x
negative numbers it stays non-negative the
whole way. So this sort of flattens out
for the negative numbers. So, pictorially,
and I encourage you to, you know, type
this into your own favorite graphing
program. You see the e^x
bounds above everywhere, the line, the 1+x.
For those of you who want
something more rigorous, there's a bunch
of ways to do it. For example, you can
look at the [inaudible] expansion of
e^x at the point 0. >>What's the
point? The point is this allows us to do
some very simple calculations on our
previous upper bound on the failure
probability by working with exponentials
instead of working with these ugly one
minus whatevers raised to the whatever
term. So, let's combine our upper bound
from the previous slide with the upper
bound provided by the calculus fact. And
to be concrete, let's substitute some
particular number of capital N. So, let's
use little n^2 trials, where little
n is the number of vertices of the graph.
In which case, the probability that every
single trial fails to recover the cut (A, B)
is bounded above by e to the -1/(N^2). That's using the calculus
fact applied with X = -1/(N^2). And then we inherit the
capital N and the exponent which we just
substantiated to little n^2. So of
course the N^2 are gonna cancel,
this is gonna give us E^(-1),
also known as 1/E. So if we're
willing to do little n^2 trials,
then our failure probability has gone
from something very close to 1, to
something which is more like, say, 30 some
more percent. Now, once you get to a
constant success probability, it's very
easy to boost it further by just doing a
few more trials. So if we just add a
natural log factor, so instead of a little
n^2 trials, we do little n^2
times the natural log of the little n.
Now, the probability that everything fails
is bound and above by the 1/e that we
had last time, but still with the residual
natural log of N up top. And this is now,
merely 1/N. So I hope it's clear
what happened. We took a very simple, very
elegant algorithm, that almost always
didn't do what we want. It almost always
failed to output the cut (A, B). It did it
with only probability 1/(n^2).
But, 1/(n^2) is
still big enough that we can boost it, so
that it almost always succeeds just by
doing repeated trials. And the number of
repeated trials that we need is the
reciprocal of its original success
probability boosted by, for the
logarithmic factor. So that transformed
this almost always failing algorithm into
an almost always succeeding algorithm. And
that's a more general less, more general
algorithm technique, which is certainly
worth remembering. >>Let me conclude with a
couple comments about the running time.
This is probably the first algorithm of a
course, of the course where we haven't
obsessed over just what the running time
is. And I said, it's simple enough. It's
not hard to figure out what it is, but
it's actually not that impressive. And
that's why I haven't been obsessing over
it. This is not almost linear. This is not
a for free primitive as I've described it
here. So it's certainly a polynomial-time
algorithm; its running time is bounded
above by some polynomial in n and m. So
it's way better than the exponential time
you get from brute-force search through
all 2^n possible cuts. But it is
certainly, the way I've described it, we
gotta to n^2 trials, plus a log
factor, which I'm not even going to bother
writing down. And also, each trial, while
at the very least, you look at all the
edges, so that's going to be another
factor of M. So this is a bigger
polynomial than in any, almost any of the
algorithms that we're going to see. Now, I
don't wanna undersell this application of
random sampling in computing cuts because
I've just shown you the simplest, most
elegant, most basic, but therefore also
the slowest implementation of using
contractions to compute cuts. There's been
follow-up work with a lot of extra
optimizations, in particular, doing stuff
much more clever than just repeated
trials, so basically using work that you
did in previous trials to inform how you
look for cuts in subsequent trials. And
you can shave large factors off of the
running time. So there are much better
implementations of this randomized
contraction algorithm than what I'm
showing you here. Those are, however,
outside the course, scope of this course.