As our final digraph processing algorithm,
we'll take a look at computing strong
components.
So the definition two vertices, v and w,
in a digraph are strongly connected if
there's a directed path from v to w and
another directed path from w to v.
And the thing about strongly connected is
that it's unequivalence relation. That is
each vertex is strongly connected to
itself.
If v is connected to w, strongly, and w is
strongly connected to, to, then w is
strongly connected to v.
That just means that paths, directed paths
connecting them.
And it's also transitive.
If v is connected to w and w to x,, then v
is strongly connected to x.
How to get from v to x? You go from w, v
to w and w to x,
And get from x to v, you go from x to w
and w to v.
So, that means that since it's an
equivalence relation, it divides the
digraph up into components, into sets
called strongly connected components that
have the property that there's directed
paths connecting each pair of vertices in
the set.
So, for example nine and twelve are
strongly connected,
There's a path from twelve to nine,
Another one from nine to twelve and so
forth.
So the,
Our challenge is to compute the strong
components in diagraphs.
And it's worth comparing this to what we
did for connected components in undirected
graphs.
So, this is just a quick review.
In an undirected graph, if there's a path
between two vertices that are connected,
that's an equivalence relation, divides
them up into connective components.
And what we did was, our design pattern is
to build our graph processing algorithm as
a constructor that takes a graph and does
the pre-processing to create a table like
this one which assigns a unique ID to all
the vertices in each given component.
So, that we can have a constant time
client query to check whether two given
vertices are in the same connected
component or not.
So, linear time processing, pre-processing
in the constructor to build the table,
constant time client queries.
And that's as good performance as we can
expect to have for a graph for a graph
processing algorithm.
Remarkably, now, we're able to do the same
thing for strong connectivity.
It's a, it's a much more sophisticated
algorithm but the design pattern and the
bottom line for the client is the same.
There's a constructor, it processes the
graph in linear time.
And it assigns a unique ID to each one of
the strongly connected components in the
graph.
So, once in the component by itself,
Zero, two, three, four, and five six, and
eight, seven, and nine, ten, and twelve,
There's five different strongly connected
components in this graph.
The constructor builds that array and the
client gets to in constant time, know
whether two vertices are strongly
connected or not.
It's quite amazing that we can solve this
problem in this way.
And as I'd mentioned it's got an
interesting history that we'll talk about
in a second.
So, here's an example of strong component
application.
So, the food web graph,
This is a very small subset of that graph.
The vertices are all the, species and an
edge goes from producer to consumer.
So, if animal a eats animal b, there's an
edge from a to b.
And what a strong component corresponds to
in this graph is a kind of a subset of
species that, have a common energy flow.
They naturally eat each other.
And that's extremely important in
ecological studies.
Another example is again, in processing
software big software is made up of
modules.
And the vertices are, are, are, are the
modules.
And edges is if one depends on another.
And so a strong component in this graph is
a subset of mutually interacting modules.
And in a huge program like Internet
Explorer you want to know what the strong
components are so that you can package
them together and maybe improve the
design.
So again, software you know, can, these
graphs can be huge.
And this kind of graph processing can be
extremely important in improving the
design of software.
Now again this algorithm has an
interesting history,, it along with
scheduling and other things that's a core
problem in operations research that was
widely studied, but really not understood
how difficult it was.
In Tarjan's paper in 1972 was a big
surprise that this could be solved in
linear time with depth research.
Now this algorithm had a couple of other
data structures and this I guess,
A diligent student in this class could
understand it with quite a bit of work.
But it really demonstrated the importance
of depth-first search in great graph
processing.
Now, the algorithm that we're going to
talk about today actually was invented in
the 80s' by Kosaraju and, and
independently by Sharir.
The story goes that Kosaraju had to go
lecture about Tarjan's algorithms and he
forgot his notes.
And he had taught it a number of times, he
was trying to figure out what Tarjan's
algorithm did and he developed this other
algorithm that's extremely simple ro
implement.
That's what we're going to look at today,
today.
And actually in the 1990s Gabow and
Mehlhorn and particularly, Melhorn had to
implement this algorithm for a big
software package and he found another
simple linear time algorithm.
Now, so, this story indicates even from
fundamental problems in graph processing
there's algorithms out there still waiting
to be discovered and this, this algorithm
is a good example of that.
So we give the intuition in the code and
you see how the algorithm works fully
convincing yourself or proving why it
works is a bit more of a challenge.
Now, we'll leave that mostly for the book.
But let's describe what it is.
So, the first idea is to think about the
reverse graph.
So if we take a graph and we reverse the
sense of all the edges we're going to get
the same, strong components.
We need edges in both directions.
So, if we switch the directions, we're
still going to have edges in both
directions.
The, the second concept is called the,
the, the kernel DAG.
And what that does, is it kind of, you
think about contracting each strong
component into a single vertex.
And then, just to worry about the edges
that go from one strong component to
another.
So the digraph that you get that way,
turns out to be a cyclic.
If it was cyclic, it would involve, it
would create a bigger strong component, a
couple of strong components into one.
So if we think about that processing that
kernel DAG, that's the edges that go
between strong components and we get a DAG
and we know how to deal with a DAG So, the
idea of the algorithm is to go ahead and
compute the topological order or the
reverse post order in the kernel DAG.
At least put up the edges of the original
digraph in order so that all the edges in
the kernel DAG point from right to left,
That's like a topological sort.
And then we run DFS but instead of
considering the vertices in, in numerical
order for the DFS, we consider them in
that reverse topological order.
So it's extremely easy to implement this
algorithm.
Of course, we're going to use DFS.
So let's look at a demo at how the
algorithm works., okay, so, this is a
digraph, and our goal is to compute the
strong components.
And we're going to do it in two phases,
two DFSs.
One is to compute the reverse post order
and the reverse of the graph.
And the other is to run the DFS but for
the other in which we visit the vertices,
we use that reverse post order that we've
computed. so here goes.
So, first, we'll do the DFS and the
reverse graph.
So that's the graph, that's the reverse
graph.
Remember, these two has the same strong
component.
So, there's our marked array.
We will do the DFS, and reverse post order
means that when we are done with the
vertex, we'll put it out.
So checked six unmarked, checked eight
unmarked checked six, it's marked. so
we're done with a answer, that's the
reverse post order. and again, as before,
put it on a stack and but we'll just list
them in reverse order in this demo.
So, eight is done.
So, six lots of places to go from six.
So, let's check seven.
That's unmarked, so we go to seven.
No place to go from seven, so we're done
with seven.
We put it on the reverse post order list.
We're also done with six.
Cuz four and nine are coming in, in this
example, so I put it on the list.
Next place to go from zero is two, so we
checked two
That's unmarked so we mark it and recurse
and we checked the vertices adjacent to
two.
First, four, that's unmarked.
So, then we recurse and go to four.
We gotta got to, to eleven first and
recurse so now we're at eleven.
And, and from eleven, we go to nine, which
is unmarked.
So we have a pretty long recursive stack
right here.
So, from nine, there's we have to check a
bunch of things, we'll check twelve first
and then visit twelve.
It's unmarked from twelve, we checked
eleven which is marked, nowhere to go.
Then, we checked ten. That's unmarked, so
we go to ten.
Visit ten it's, it's unmarked so we
recurse and then go to nine.
And that's marked and that's everywhere we
get from ten.
So, we're done with ten, so we put it on
the list and return. And now, where done
with twelve, so we put it on the list and
return.
And now, at nine, we have to check seven,
which is marked, and six, which is marked.
And then, we're done so we put it on the
list.
Then, we're done with eleven we put it on
the list.
Then we're finished with four, so we
checked six which is marked, then we
checked five, which is unmarked so we
recurse to five.
From five, we checked three which is
unmarked so we recurse.
Then, we checked four which is mark, we
checked two which is mark.
And then we're done with three, so we put
it on the list.
And then from five we checked zero it's
marked so we're done with five, we put it
on the list.
And that means that we're now done with
four.
And then, we're also done with two, after
checking three.
And then, we go to zero and put it on the
list.
So, that's all the vertices that, you
know, you get through from zero.
So, we look for more vertices and it's one
and it's marked so that's the last one in
the reverse post order.
So, that's a reverse post order of the
reverse graph.
And all we're going to do is take that
order and use that order to check vertices
at the top level in the depth-first search
of our regular graph.
We have to check all the other vertices to
make sure we are done.
So, that's phase one.
So now, we'll just do a DFS in the
original graph, using that order that we
just computed. So, we don't start with
zero, the way we always have, now we start
with one. we visit one, its unmarked and
now all the vertices that we visit during
that DFS are going to be in that same
strong component, that's the theorem that
makes this algorithm work, in this case,
this is only the one, so vertex one is its
own, is in its own strong component with
label zero.
So now, we've got to start, once all done,
so now, we have to start with looking for
another vertex to search from.
In this case, it's zero that's second on
the list.
So that's where we start,
With zero.
And now, all the vertices that we can
reach from zero are going to have strong,
in this graph, are going to have strong
component label one.
So, we do the DFS.
So, first, we get to five.
That's in the same strong component we
checked four.
And it's unmarked.
So, we label it.
We checked three, it's unmarked.
We checked five, which is marked.
We checked two, which is unmarked.
So now we have shown that zero, two,
three, four, and five are all in the same
strong component.
And now we're going to find both vertices
we can get three from two or mark.
So we're done with two then we're done
with three.
Four, we have to check two, which is
marked.
And then, we're done with four and five.
From zero we can check one but that's
already marked so that's not relevant for
this search and then we're done with zero.
And we've established that zero, two,
three, four, and five are all in the same
strong component.
, So that's the second one.
So now, we continue and we check all of
those and they're all marked. so, the next
vertex in the reverse post order of the
reverse graph is eleven so we visit
eleven.
Checked four,
It's already marked.
Checked twelve,
It's not so aj. we mark it, and these are
the third strong component.
They get labeled with two.
Then, we get to nine. from nine, we check
eleven and nowhere to go. Then, we checked
ten, and so that's seems a strong
component. from ten, we checked twelve,
which is marked. We're done with ten, were
done with nine, were done with twelve, and
were done with eleven. so, that's our
second strong or third strong component
nine, ten, eleven and twelve.
So we're done with those and now we go
through the list to find another unmarked
vertex.
Nine is marked, twelve is marked, ten is
marked. we get to six from six nine is
already marked. four is four is already
marked.
Eight is not marked so we go there. from
eight, we get to six and that's it.
Zero is already marked so we can only get
to eight from six at this point, and so
that's a strong component.
And then finally, we finish up by doing
seven.
So the end of the computation gives us the
strong component array, which is a unique
ID for each of the vertices so that two
vertices in the same strong component have
the same ID.
And that supports constant time strong
connectivity checks for clients.
And so, the bottom line is that this
algorithm is a very simple even though it
might be mysterious algorithm for
computing the strong component.
First, run DFS on the reverse graph to
compute the reverse post order.
Then, run DFS on the original graph
considering the vertices in the order
given by the first DFS.
And so these, these diagrams give a
summary of the computation.
I'm not going to spend too much time
explaining why and how it works in this
lecture.
But these kinds of diagrams give the
detail that can give some intuition about
how and why the algorithm works.
The for the proof it requires some
mathematical sophistication and find it in
the book.
But the programming of it is quite simple
and proved that it's efficient is and all
it does is run DFS twice but to really see
the correctness proof you want to look a
the second printing of the textbook. we
got it wrong in the first printing.
And, but look at the implementation so,
this is connected components and in the
undirected graph with DFS that we did last
time and I'm sure many of you thought that
it was one of the simplest algorithms that
we looked at.
We just marked the vertices and do
recursive and that's the end of it.
If you take that code and just had I'll
change the names.
And then just instead of going for the
vertices from zero through v minus one if
you first compute the depth-first order
that you get by running doing that first
search of the reverse graph and then you
iterate through the vertices in that
order, you get an algorithm for strong
connectivity.
It's really remarkable that just changing
that one line will perform this
computation that was thought to be
difficult for,, for many years, and, and
people were learning quite difficult codes
for, for many, many years.
So, that's a quick summary of digraph
processing.
We talked about single source reachability
getting the paths from a vertex to any
vertex that can be reached from that
vertex with a directed path.
We talked about topological sort in graphs
that have no cycles and that uses DFS.
And we talked about computing strong
components in a digraph with two DFSs.
Digraph processing is really h, a
testimony to the ingenuity that's possible
in algorithm and graph processing.