1
00:00:00,000 --> 00:00:04,160
Let's explore our second strategy for
graph search, namely depth-first search.

2
00:00:04,160 --> 00:00:09,006
And again, like with breadth-first search,
I'll open by just reminding you what the

3
00:00:09,006 --> 00:00:12,845
depth-first search is good for. And we'll
trace through it in a particular example

4
00:00:12,845 --> 00:00:17,022
and then we'll tell you what the actual
code is. So if breath-first search is the

5
00:00:17,022 --> 00:00:21,064
cautious and tentative exploration
strategy, then depth-first search or DFS

6
00:00:21,064 --> 00:00:26,061
for short is its more aggressive cousin.
So the plan is to explore aggressively and

7
00:00:26,061 --> 00:00:31,051
only backtrack when necessary. And this is
very much the strategy one often

8
00:00:31,051 --> 00:00:36,006
uses when trying to solve a maze. >>To
explain what I mean, let me show you how

9
00:00:36,006 --> 00:00:40,084
this would work in the same running
example we used when we discussed breath-first

10
00:00:40,084 --> 00:00:45,039
search. So here if we invoke depth-first
search from the node number S.

11
00:00:45,039 --> 00:00:49,039
Here's what's gonna happen, so obviously
we start at S and obviously there's two

12
00:00:49,039 --> 00:00:53,045
places where we can go next. We can go to
A or, or to B, and depth-first search is

13
00:00:53,045 --> 00:00:56,492
underdetermined like breath-first search, we
can pick either one. So like with the

14
00:00:56,492 --> 00:01:02,040
breath-first search example, let's go to A
first. So A will be the second one that

15
00:01:02,040 --> 00:01:06,078
we explore. But now unlike breadth-first search
where we automatically went to node B

16
00:01:06,078 --> 00:01:11,359
next, since that was the other layer one
node, here the only rule is that we have

17
00:01:11,359 --> 00:01:17,202
to go next to one of A's immediate
neighbors. So we might go to B but we're

18
00:01:17,202 --> 00:01:20,111
not going to B because it's one of the
neighbors of S, we go because it's one of

19
00:01:20,111 --> 00:01:23,606
the neighbors of A, and actually to make
sure the difference is clear, let's assume

20
00:01:23,606 --> 00:01:28,093
that we aggressively pursue deeper and we
go from A to C. And now the depth-first

21
00:01:28,093 --> 00:01:33,034
search strategy is, again, just to pursue
deeper. So, you go to one of C's immediate

22
00:01:33,034 --> 00:01:37,059
neighbors. So, maybe we go to E next. So,
E is going to be the fourth one visited.

23
00:01:37,059 --> 00:01:41,095
Now, from E there's only one neighbor, not
counting the one that we came in on. So,

24
00:01:41,095 --> 00:01:46,027
from E we go to D. And D is the fifth one
we see, now from D we have a choice, we

25
00:01:46,027 --> 00:01:50,075
can either go to B or we can go to C. So
let's suppose we go to C from D. Well then

26
00:01:50,075 --> 00:01:55,001
we get to a node number three, where we've
been before, okay? And as usual we're

27
00:01:55,001 --> 00:01:58,032
going to keep track of where we've already
been. So at this point we have to

28
00:01:58,032 --> 00:02:03,070
backtrack from C back to D, we retreat to
D. Now there's still another outgoing edge

29
00:02:03,070 --> 00:02:08,021
from D to explore, namely the one to B.
And so what happens is, we actually wind

30
00:02:08,021 --> 00:02:12,039
up wrapping all the way around this outer
cycle. And we hit B sixth. And now, of

31
00:02:12,039 --> 00:02:16,097
course, anywhere we try to explore, we see
somewhere we've already been, so from B we

32
00:02:16,097 --> 00:02:20,097
try to go to S, but we've been there, so
we retreat to B, we try, we can try to go

33
00:02:20,097 --> 00:02:24,072
to A, but we've been there, so we retreat
to B. Now we've explored all of the

34
00:02:24,072 --> 00:02:28,063
options out of B. So we have to retreat
from B, we have to go back to D. Now from

35
00:02:28,065 --> 00:02:32,085
D we've explored both B and C, so we have
to retreat back to E. E we've explored the

36
00:02:32,085 --> 00:02:37,019
only outgoing mark D, so we have to
retreat to C. C we retreat to A. From A,

37
00:02:37,026 --> 00:02:42,093
we actually haven't yet looked along this
arc. But that just sends to B where we've

38
00:02:42,093 --> 00:02:47,024
been before. So then we retreat back to A.
Finally, we retreat back to S and S, even

39
00:02:47,024 --> 00:02:51,029
at S there's still an extra edge to
explore. At S we say, oh we haven't tried

40
00:02:51,029 --> 00:02:55,066
this S-B edge yet but, of course, when we
look across, we get to B where we've been

41
00:02:55,066 --> 00:02:59,098
before and then we backtrack to S. Then
we've looked at every edge once and so we

42
00:02:59,098 --> 00:03:04,063
stop. That's how depth-first search works.
You just pursue your path. You go to an

43
00:03:04,063 --> 00:03:08,050
immediate neighbor as long as you can
until you hit somewhere you've been before

44
00:03:08,050 --> 00:03:12,020
and then you retreat. >>So
you might be wondering you know why bother

45
00:03:12,020 --> 00:03:15,506
with another graph search strategy. After
all we have breadth-first search which seems

46
00:03:15,506 --> 00:03:19,881
pretty awesome right? It runs in linear
time. It's guaranteed to find everything

47
00:03:19,881 --> 00:03:24,933
you might wanna find. It computes shortest
paths. It computes connected components if

48
00:03:24,933 --> 00:03:29,041
you imbed it in a for loop. Kinda seems
like what else would you want? Well, it turns

49
00:03:29,041 --> 00:03:32,832
out depth-first search is gonna have its own
impressive catalog of applications which

50
00:03:32,832 --> 00:03:36,241
you can't necessarily replicate with
breadth-first search. And I'm gonna focus on

51
00:03:36,241 --> 00:03:40,100
applications in directed graphs. So
there's gonna be a simple one that we

52
00:03:40,100 --> 00:03:43,351
discuss in this video. And then there's
gonna be a more complicated one that has a

53
00:03:43,351 --> 00:03:47,520
separate video devoted to it. So in this
video, we're gonna be discussing computing

54
00:03:47,520 --> 00:03:51,410
topological orderings of directed acyclic
graphs. That is, directed graphs that have

55
00:03:51,410 --> 00:03:56,528
no directed cycle. The more complicated
application is computing strongly

56
00:03:56,528 --> 00:04:00,253
connected components in directed graphs.
The run time will be essentially the same

57
00:04:00,253 --> 00:04:04,583
as it was for breadth-first search and
the best we could hope for which is linear

58
00:04:04,583 --> 00:04:09,253
time. And again, we're not assuming that
there's necessarily that many edges. There

59
00:04:09,253 --> 00:04:13,797
may be much fewer edges than vertices.
So, linear time in these connectivity

60
00:04:13,797 --> 00:04:19,271
applications means O(M+N). >>So let's
now talk about the actual code of

61
00:04:19,271 --> 00:04:23,229
depth-first search. There's a couple ways
to do it. One way to do it is to just make

62
00:04:23,229 --> 00:04:27,065
some minor modifications to the code for
breadth-first search. The primary

63
00:04:27,065 --> 00:04:31,194
difference being instead of using a queue
at its first-in-first-out behavior, you

64
00:04:31,194 --> 00:04:35,321
swap in a stack where it's last-in-first-out
behavior. Again, if you don't know

65
00:04:35,321 --> 00:04:38,008
what a stack is you should read about that
in the programming textbook or on the web.

66
00:04:38,008 --> 00:04:42,902
It's something that supports constant-time insertions to the front and constant

67
00:04:42,902 --> 00:04:47,891
time deletions from the front, unlike a
queue which is meant to support constant-time

68
00:04:47,891 --> 00:04:51,175
deletions to the back. Okay, so
stack. It operates just like those

69
00:04:51,175 --> 00:04:54,979
cafeteria trays that you know where you
put in a tray, and the last one that got

70
00:04:54,979 --> 00:04:58,330
pushed in, when you take the first one
out, that's the last one that got put in.

71
00:04:58,330 --> 00:05:02,103
So these are called push and pop. In a
stack context, both are constant time. So

72
00:05:02,103 --> 00:05:05,152
if you swap out the queue, you swap in the
stack, make a couple other minor

73
00:05:05,152 --> 00:05:10,912
modifications, breadth-first search turns
into depth-first search. For the sake of

74
00:05:10,912 --> 00:05:15,412
both variety and elegance, I'm, instead,
gonna show you a recursive version. So

75
00:05:15,412 --> 00:05:18,960
depth-first search is very naturally
phrased as a recursive algorithm, and

76
00:05:18,960 --> 00:05:22,769
that's why we'll discuss here. So depth-first
search, of course, takes as input

77
00:05:22,769 --> 00:05:26,521
a graph G. And, again, it could be
undirected or directed, it doesn't matter.

78
00:05:26,521 --> 00:05:29,210
Just, with the directed graph, be sure
that you only follow arcs in the

79
00:05:29,210 --> 00:05:32,034
appropriate direction, which should be
automatically handled in the adjacency

80
00:05:32,034 --> 00:05:36,579
lists of your graph data structure
anyways. So, as always, we keep

81
00:05:36,579 --> 00:05:39,533
a boolean local to each vertex of the
graph, remembering whether we've,

82
00:05:39,533 --> 00:05:43,848
start, we've been there before or
not. And of course as soon as we start

83
00:05:43,848 --> 00:05:47,771
exploring for S, we'd better make a note
that, now we have been there. We better

84
00:05:47,771 --> 00:05:52,280
plant a flag, as it were. And remember,
that depth-first search is an aggressive search.

85
00:05:52,280 --> 00:05:56,104
So we immediately try to recursively
search from any of S's neighbors that we

86
00:05:56,104 --> 00:06:01,544
haven't already been to. And if we find such a
vertex, if we find, somewhere we've

87
00:06:01,544 --> 00:06:07,133
never been, we recursively call depth-first
search, from that node. The basic

88
00:06:07,133 --> 00:06:10,372
guarantees of depth-first search are
exactly the same as they were for

89
00:06:10,372 --> 00:06:14,089
breadth-first search. We find everything we
could possibly hope to find and we do it

90
00:06:14,089 --> 00:06:17,858
in linear time. And once again the reason
is this is simply a special case of the

91
00:06:17,858 --> 00:06:21,750
generic search of procedure that we
started this sequence of videos about.

92
00:06:21,750 --> 00:06:24,563
So it just corresponds to a
particular way of choosing amongst

93
00:06:24,563 --> 00:06:28,691
multiple crossing edges between the region
of explored nodes and the region of

94
00:06:28,691 --> 00:06:32,269
unexplored nodes, essentially always being
biased toward the most recently discovered

95
00:06:32,269 --> 00:06:35,880
explored nodes. And just like breadth-first
search, the running time is going to

96
00:06:35,880 --> 00:06:40,616
be proportional to the size of the
component that you're discovering. And the

97
00:06:40,616 --> 00:06:45,982
basic reason is that each node is looked
at only once, right? This boolean makes

98
00:06:45,982 --> 00:06:49,515
sure that we don't ever explore a node
more than once and then for each edge, we

99
00:06:49,515 --> 00:06:54,723
look at it at most twice, one from each end
point. And given that these exact same two

100
00:06:54,723 --> 00:06:58,526
claims hold for depth-first search as for
breadth-first search, that means if we

101
00:06:58,526 --> 00:07:02,408
wanted to compute connected components in
an undirected graph, we could equally well

102
00:07:02,408 --> 00:07:06,202
use an outer for loop with depth-first
search as our work horse in the inner

103
00:07:06,202 --> 00:07:09,565
loop. It wouldn't matter. Either of those
for undirected graphs, depth-first search,

104
00:07:09,565 --> 00:07:14,064
breadth-first search is gonna find all the
connected components in O(M+N) time,

105
00:07:14,064 --> 00:07:18,564
in linear time. So, instead, I wanna
look at an application particular to depth-first

106
00:07:18,564 --> 00:07:24,446
search, and this is about finding a
topological ordering of a directed acyclic graph.