So let me begin by telling you what
a topological ordering of a directed
graph is. Essentially, it's an ordering of
the vertices of the graph so that all of
the arcs, the directed edges of the graph,
only go forward in the ordering. So, let
me encode an ordering by a labeling of the
vertices with the numbers one through N.
This is just to encode the position of
each vertex in this ordering. So formally
there's gonna be a function which takes
vertices of G and maps things to integers
between one and N. Each of the numbers one
through N should be taken on by exactly
one vertex. Here N is the number of
vertices of G. So that's just the way to
encode an ordering and then here's really
the important property that every directed
edge of G goes forward in the ordering.
That is, if UV is a directed edge of the
directed graph G, then it should be that.
The F value of the tail is less than the F
value of the head. That is, this directed
edge has a higher F value as you, as you
traverse it in the correct direction. Let
me give you an example just to make this
more clear. So suppose we have this very
simple directed graph with four vertices.
Let me show you two different,
totally legitimate topological
orderings of this graph. So the first
thing you could do is you could label S1,
V2, W3 and T4. Another option would
be to label them the same way except you
can swap the labels of, V and W. So if you
want you can label V three and W two. So
again what these labeling meant to encode
is an ordering of the vertices. So the
blue labeling you can think of as encoding
the ordering in which we put S first, then
V, then W and then T. Whereas the green
labeling can be thought of as the same
ordering of the nodes, except with W
coming before V. What's important is that
the pattern of the edge is exactly the
same in both cases, and in particular, all
of the edges go forward in this ordering.
So, in either case, we have S with edges
from S to V and S to W. So that looks the
same way pictorially. Whichever order V
and W are in. And then symmetrically,
there are edges from V and W to T. So
you'll notice that no matter which order
we put V and W in, all four of these edges
go forward in each of these orderings.
Now, if you're trying to put. V before S,
it wouldn't work because the edge from S
to V would be going backward, if V
preceded S. Similarly, if you put T
anywhere other than the final position,
you would not have a topological ordering.
So in fact, these are the only two
topological orderings of this directed
graph. I encourage you to convince
yourself of that. Now who cares about
topological orderings? Well this is
actually a, a very useful subroutine this
has been, come up in all kinds of
applications. Really whenever you want a
sequence, a bunch of tasks, when there's
precedents constraints amongst them. By
precedents constraints I mean one task has
to finished before another. You can think,
for example, about the courses in some
kind of undergraduate major, like
computer science major. Here the vertices
are going to correspond to all of the
courses and there's a directed edge from
course A to course B, if course A's a
prerequisite for course B, if you have to
take it first. So then, of course, you'd
like to know a sequence in which you can
take these courses so that you always take
a course after you've taken its
prerequisites. And that's exactly what a
topological ordering will accomplish. So,
it's reasonable to ask the question, when
does the directed graph have a topological
ordering and when a graph does have such
an ordering, how do we get our grubby
little hands on it? Well there's a very
clear, necessary condition for a graph to
have a topological ordering, which is, it
had better be acyclic. Put differently,
if a directed graph has a directed cycle
then there's certainly no way there's
going to be a topological ordering. So, I
hope the reason for this is fairly clear.
Consider any directed graph which does
have a directed cycle and consider any
purported way of ordering the vertices.
Well now just traverse the edges of the
cycle one by one. So you start somewhere
on the cycle. And if the first edge goes
backwards, well you're already screwed.
You already know that this ordering is not
topological; no way it just can go
backwards. So evidently the first edge of
this cycle has to go forward, but now you
have to traverse the rest of the edges on
this cycle, and eventually you come back
to where you started. So if you started
out by going forward, at some point you
have to go backward. So that edge goes
backward in the ordering, violating the
property of the topological ordering.
That's true for every ordering, so
directed cycles exclude the possibility
of topological orderings. Now the question
is, well what if you don't have a cycle?
Is that a strong enough condition that
you're guaranteed to have a topological
ordering. Is the only obstruction to
sequencing jobs without conflicts, the
obvious one of having circular precedence
constraints? So it turns out that only is
the answer yes as long as you don't have any
directed cycles, you're guaranteed a
topological ordering, but we can even compute
one in linear time no less, via depth-first
search. So before I show you this super
slick and super efficient reduction of
computing topological orderings to depth
first search, let me first go over a pretty
good but slightly less slick and slightly
less efficient solution to help build up
your intuition about directed acyclic
graphs and their topological orderings. So
for the straightforward solution we're
going to begin with a simple observation.
Every directed acyclic graph has what I'm
gonna call a sink vertex. That is a vertex
without any outgoing arcs. So in the four
node, directed acyclic graph we were
exploring on the last slide, there is
exactly one source vertex and that's,
excuse me, sink vertex, that's this right
most vertex here. Right? That has no
outgoing arcs; the other three vertices
all have at least one outgoing arc. Now,
why is it the case that a directed acyclic
graph has to have a sink vertex? Well
suppose it didn't. Suppose it had no sink
vertex. That would mean every single
vertex has at least one outgoing arc. So
what can we do if every vertex has one
outgoing arc? Well we can start in an
arbitrary node. We know it's not a sink
vertex cause we're assuming there aren't
any. So there's an outgoing arc so let's
follow it. We get to some other node, by
assumption there's no sink vertex so this
isn't a sink vertex. So there's an
outgoing arc. So let's follow it, we get
to some other node. That also has an
outgoing arc, let's follow that, and so
on. So we just keep following outgoing
arcs. And we do this as long as we want,
because every vertex has at least one
outgoing arc. Well, there's a finite
number of vertices. Right? This graph has,
say, N vertices. So, if we follow N arcs,
we're gonna see N plus one vertices. So,
by the pigeon hold principle, we're gonna
have to see a repeat. Right? So, if N plus
one vertices has only N distinct vertices,
we're gonna see some vertex twice. So, for
example, maybe after I take the outgoing
arc from this vertex. I get back to this
one that I saw previously. Well, what have
we done? What happens when we get a
repeated vertex? By tracing these out
going arcs and repeating a vertex we have
exhibited a directed cycle. And that's
exactly what we're assuming doesn't exist.
We're talking about directed acyclic graphs.
So, put differently, we just proved that a
vertex with no sink vertex has to have a
directed cycle. So a directed acyclic
graph therefore has to have at least one
sink vertex. So, here's how we use this
very simple observation now to compute a
topological order of a directed acyclic graph.
Well let's do a little thought experiment.
Suppose in fact this graph did have a
topological ordering. Let's think about the
vertex that goes last in this topological
ordering. Remember, any arc which goes
backward in the ordering is a violation.
So we have to avoid that. We have to make
sure that every arc goes forward in the
ordering. Now, any vertex which has an
outgoing arc, we better put somewhere
other than in the final position, all
right? So the node that we put in the
final position, all of its arcs are gonna
wind up to all of its outgoing arcs are
gonna wind up going backward in the
topological ordering. There's nowhere else
they can go. This vertex is last. So in
other words, if we're plan this to
successfully compute topological ordering,
the only candidate vertices for that final
position in the ordering are the sink
vertices, that's all that's gonna work. If
we put another non-sink vertex there
we're toast, that's not gonna happen.
Fortunately, if it's directed acyclic we
know there is a sink vertex, so at V, be a
sink vertex of G, if there's many sink
vertices we pick one arbitrarily, we set
V's label to be the maximum possible.
So there's N vertices, we're gonna put that
in the Nth position. And now we just
recurse on the rest of the graph, which has
only N minus one vertices. So how would
this work in the example on the right?
Well in the first iteration, or the first
outermost recursive call, the only key, the only sink
vertex is this right's most circled in
green. So there's four vertices, we're gonna
give that the label four. So then having
labeled that four, we delete that vertex
and all the edges incident to it, and we
recurse on what's left of the graph. So
that will be the left most three vertices
plus the left most two edges. Now, this
graph has two sink vertices after we've
deleted four and everything from it. So
both this top vertex and this bottom
vertex are sinks in the residual graph. So
now in the next recursive call, we can
choose either of those as our sink vertex.
Because we have two choices, that
generates two topological orderings. Those
are exactly the ones we saw in the
example. But if, for example, we choose
this one to be our sink vertex, then
that gets the label three. Then we recurse
just on the northwestern most two edges.
This vertex is the unique sink in that
graph. That gets the label two. And then
we recurse on the one node graph and that
gets the, the label one. So
why does this algorithm work? Well
there's, there's two quick observations we
need. So first of all, we need to argue
that it makes sense, that in every
iteration or in every recursive call, we
can indeed find a sink vertex that we
can assign in the final position that's
still unfilled. And the reason for that,
is just, if you take a directed acyclic
graph, and you delete one or more vertices
from it, you're still gonna have a
directed acyclic graph. Alright, you can't
create cycles by just getting rid of
stuff. You can only destroy cycles. And we
started with no cycles. So through all the
intermediate recursive calls, we have no
cycles. By our first observation, there's
always a sink. So the second thing we have
to argue is that we really do produce a
topological ordering. So remember what
that means. That means that for every edge
of the graph, it goes forward in the
ordering. That is, the head of the arc is
given a position later than the tail of
the arc. And this simply follows because
we always use sink vertices. So
consider the vertex V, which is assigned
to the position I, this means then when
we're down to a graph that has only I
vertices remaining, V is a sink vertex. So
if I is, if V is a sink vertex for, when
only the first I vertices remain, what
property does it have in the original
graph? Well it means all of the outgoing
arcs that it has; have to go to vertices
that were already deleted and assigned
higher positions. So for every vertex, by
the time it actually gets assigned a
position, it's a sink. And it only has
incoming arcs from the as yet unassigned
vertices. Its outgoing arcs all go
forward to vertices that were already
assigned higher positions, and got deleted
previously from the graph. So now we have
under our belt a pretty reasonable
solution for computing a topological ordering
of a directed acyclic graph. In particular,
remember we observed that if a graph does
have a directed cycle, then of course
there is no way there's a topological ordering.
However you order the vertices, some edge of
the cycle is going to have to go backward.
And the solution on the previous slide
shows that as long as you don't have a
cycle, it guarantees a topological
ordering does indeed exist and in fact
it's a constructive proof, a constructive
argument that gives an algorithm. What you
do is you just keep plucking off sinks,
sink vertices one at a time and populating
the ordering from right to left as you
keep peeling off these sinks. So that's a
pretty good algorithm, it's not too slow.
And actually, if you implement it just so,
you can even get it to run in linear time.
But I wanna conclude this video with an
application of depth first search, which
is a very slick, very efficient
computation of a topological ordering of a
directed acyclic graph. So we're just
going to make two really quite minor
modifications to our previous depth first
search subroutine. The first thing is we
have to embed it in a for loop just like
we did with breadth first search when we
were computing the connected components of
an undirected graph. That's because in
computing a topological ordering, we'd
better give every single vertex a label,
we better look at every vertex at least
once. So to do that, we'll just make sure
there's an outer for loop and then if we
have multiple components, we'll just make
sure to invoke DFS as often as we need to.
The second thing we'll do is we'll add a
little bit of bookkeeping and this will
make sure that every node gets a label. And in
fact, these labels will define the
topological order. So let's not forget the
code for depth first search. This is where
you're given a graph G; in this case we're assuming
a directed acyclic graph and you're given a
start vertex S. And what you do is you, as soon as
you get to S, you very aggressively start
trying to explore its neighbors. Of
course, you don't visit any vertex you've
already been to; you keep track of who
you've visited. And if you find any vertex
that you haven't seen before, you
immediately start recursing on that node.
So I said the first modification we need
is to embed this into an outer for loop to
ensure that every single node gets labeled
so I'm gonna call that subroutine DFS-loop.
It does not take a start vertex.
Initialization all nodes start out
unexplored of course. And we're also going
to keep track of a global variable which
I'll call current label. This is going to
be initialized to N and we're gonna
count down each time we finish exploring a
new node. And these will be precisely the
F values. These will be exactly the
positions of the vertices in the
topological ordering that we output. In
the main, in the main loop we're gonna
iterate over all of the nodes of the
graph. So, for example, we just do a scan
through the node array. As usual, we don't
wanna do any work twice, so for a, if a
vertex has already been explored in some
previous indication of DFS, we don't, we
don't search from it. This should all be
familiar from our embedding a
breadth-first search in a for loop when we
computed the connected components of an
undirected graph, and if we get to a
vertex V of the graph that we haven't
explored yet, then we just invoke DFS
in the graph with that vertex as the
starting point. So, the final thing I need
to add is I need to tell you what the F
values are with the actually assignments
of vertices to positions are. And as I
foreshadowed we're going to use this global
current label variable and that will have
us assign vertices to positions from
right to the left. Very much mimicking
what was going on our recursive solution
where we plucked off sink vertices one at a
time. So when's the right time to assign a
vertex its position? Well, it turns out
the right time is when we've completely
finished with that vertex. So we're about
to pop the recursive call from the stack
corresponding to that vertex. So after
we've gone through the for loop of all
the edges outgoing from a given vertex,
we set F of S equal to whatever the
current label is and then we decrement the
current label. And that's it. That is the
entire algorithm. So, the claim is going
to be that the F values produced, which
you'll notice, are going to be the
integers between N through one, because
DFS will be called eventually once on
every vertex and it will get some integer
assignment at the end. And everybody's
gonna get a distinct value and the largest
one is N and the smallest one is one. The
claim is that is a topological ordering.
Clearly this algorithm is just as
blazingly fast as DFS itself, with just a
trivial amount of extra bookkeeping. Let's
see how it works on our running example.
So let's just say we have this four node
directed graph, which we're getting quite
used to. So this has four vertices. So we
initialize the current label variable to be equal to four.
So let's say that, in the outer DFS loop, let's stay we
start somewhere like the vertex V. So
notice, in this outer for loop, we wind
up considering the vertices in a
totally arbitrary order. So let's say we
first call DFS from this vertex V. So
what happens? Well, the only place you can
go from V is to T and then at T there's
nowhere to go. So we recursively call
DFS of T. There's no edges to go through.
We finish the for loop and so T is
going to be assigned an F value equal to
the current label, which is N and here N
is the number of vertices which is four.
So F of T is going to get. Sorry T is
going to get the assignment the label
four. So then now we're done with T we
back track back to V. We decrement the
current label as we finish up with T. We
get to V, and now there's no more outgoing
arcs to explore, so its for loop is
finished. So we're done with it, with
depth-first search. So it gets what's the new
current label, which is now three. And
again having finished with V we decrement
the current label which is now down to
two. So now we go back to the outer
for loop. Maybe the next vertex we
consider is the vertex T. But we've
already been there so don't bother to DFS
on T. Then maybe after that we tried it on
S. So, maybe S is the third vertex that
the for loop considers. We haven't seen S
yet so invoked DSF starting from vertex S.
From S there's two arcs to explore the one
with V. V we've already seen nothing is
gonna happen with arc SV. But on the other hand
the arc SW will cause us to recursively
call DFS on W. From W we try to look at
the arc from W to T, but we've already
been to T so we don't do anything, that
finishes up with W so depth-first search then
finishes up at the vertex W, W gets the
assignment of the current label, so F of
W equals two. We decrement current
label, now its value is one, now we
backtrack to S, we've already considered
all of S's outgoing arcs so we're done
with S, it gets the current label which is
one. And this is indeed one of the two
topological orderings of this graph that
we exhibited, a couple slides ago. So
that's the full description of the
algorithm and how it works on a concrete
example. Let's just discuss what are its
key properties, its running time and its
correctness. So as far as the running
time of this algorithm, the running time
is linear; it's exactly what you'd want it
to be. And the reason the running time is
linear is for the usual reasons that these
graph search algorithms have run a linear
time, you're explicitly keeping track of
which nodes you've been to so that you
don't visit them twice. So you only do
a constant amount of work for each of
the N nodes and each edge in a directed
graph, you actually only look at each edge
once when you visit the tail of that edge.
So you only do a constant amount of work
per edge as well. Of course the other key
property is correctness. That is we need
to show that you all are guaranteed to get
a topological ordering. So what does that
mean? That means every edge. Every arc
travels forward in the ordering. So if UV
is in edge, then F of U, the label
assigned to U, in this algorithm is
less than the label assigned to V. The
proof of correctness splits into two cases
depending on which of the vertices U or V
is visited first by depth-first search.
Because of our for loop, which iterates
over all of the vertices of the graph
G, depth-first search is going to be
invoked exactly once from each of the
vertices. Either U or V could be first,
both are possible. So, first, let's assume
that U is visited by DFS before V. So then
what happens? Well, remember what
depth-first search does when you invoke it
from a node, it's going to find everything
findable from that node. So, if U is
visited before V, that means V doesn't get
explored so it's a candidate for being
discovered. Moreover, there's an arc
straight from U to V. So, certainly, DFS
invoked at U is going to discover V.
Furthermore, the recursive call
corresponding to the node V, is going to
finish, is going to get popped off the
program stack before that of U. The
easiest way to see this is just to think
about the recursive structure of
depth-first search. So when you call
depth-first search from U, that recursive
call, that's gonna make further recursive
calls to all of the relevant neighbors
including V and U's call is not gonna get
popped off the stack until V's does
beforehand, that's because of the last in first out
nature of a stack or of a recursive algorithm.
So, because V's recursive call finishes
before that of U, that means it will be
assigned a larger label than U.
Remember, the labels keep decreasing as
more and more recursive calls get popped
off the stack. So that's exactly what we
wanted. Now, what's up in the second case,
case two? So this is where V is visited
before U. And here's where we use the
fact that the graph has no cycles. So
there's a direct arc from U to V. That
means there cannot be any directed path
from V all the way back to U. That would
create a directed cycle. Therefore, DFS,
invoked from V, is not going to discover
U. There's no directed path from V to U,
again, if there was, there would be a
directed cycle, so it doesn't find U at
all. So the recursive call of V, again, is
going to get popped before U's is even
pushed onto the stack. So we're totally
done with V before we even start to
consider U. So therefore for the same
reasons, since V's recursive call finishes
first, its label is going to be larger,
which is exactly what we wanted to prove.
So that concludes the first quite
interesting application of depth-first
search. In the next video we'll look at an
even more interesting one, which computes
the strongly connected components of a
directed graph. This time we can't do it
in one depth for search, we'll need two.