1 00:00:00,000 --> 00:00:04,160 Let's explore our second strategy for graph search, namely depth-first search. 2 00:00:04,160 --> 00:00:09,006 And again, like with breadth-first search, I'll open by just reminding you what the 3 00:00:09,006 --> 00:00:12,845 depth-first search is good for. And we'll trace through it in a particular example 4 00:00:12,845 --> 00:00:17,022 and then we'll tell you what the actual code is. So if breath-first search is the 5 00:00:17,022 --> 00:00:21,064 cautious and tentative exploration strategy, then depth-first search or DFS 6 00:00:21,064 --> 00:00:26,061 for short is its more aggressive cousin. So the plan is to explore aggressively and 7 00:00:26,061 --> 00:00:31,051 only backtrack when necessary. And this is very much the strategy one often 8 00:00:31,051 --> 00:00:36,006 uses when trying to solve a maze. >>To explain what I mean, let me show you how 9 00:00:36,006 --> 00:00:40,084 this would work in the same running example we used when we discussed breath-first 10 00:00:40,084 --> 00:00:45,039 search. So here if we invoke depth-first search from the node number S. 11 00:00:45,039 --> 00:00:49,039 Here's what's gonna happen, so obviously we start at S and obviously there's two 12 00:00:49,039 --> 00:00:53,045 places where we can go next. We can go to A or, or to B, and depth-first search is 13 00:00:53,045 --> 00:00:56,492 underdetermined like breath-first search, we can pick either one. So like with the 14 00:00:56,492 --> 00:01:02,040 breath-first search example, let's go to A first. So A will be the second one that 15 00:01:02,040 --> 00:01:06,078 we explore. But now unlike breadth-first search where we automatically went to node B 16 00:01:06,078 --> 00:01:11,359 next, since that was the other layer one node, here the only rule is that we have 17 00:01:11,359 --> 00:01:17,202 to go next to one of A's immediate neighbors. So we might go to B but we're 18 00:01:17,202 --> 00:01:20,111 not going to B because it's one of the neighbors of S, we go because it's one of 19 00:01:20,111 --> 00:01:23,606 the neighbors of A, and actually to make sure the difference is clear, let's assume 20 00:01:23,606 --> 00:01:28,093 that we aggressively pursue deeper and we go from A to C. And now the depth-first 21 00:01:28,093 --> 00:01:33,034 search strategy is, again, just to pursue deeper. So, you go to one of C's immediate 22 00:01:33,034 --> 00:01:37,059 neighbors. So, maybe we go to E next. So, E is going to be the fourth one visited. 23 00:01:37,059 --> 00:01:41,095 Now, from E there's only one neighbor, not counting the one that we came in on. So, 24 00:01:41,095 --> 00:01:46,027 from E we go to D. And D is the fifth one we see, now from D we have a choice, we 25 00:01:46,027 --> 00:01:50,075 can either go to B or we can go to C. So let's suppose we go to C from D. Well then 26 00:01:50,075 --> 00:01:55,001 we get to a node number three, where we've been before, okay? And as usual we're 27 00:01:55,001 --> 00:01:58,032 going to keep track of where we've already been. So at this point we have to 28 00:01:58,032 --> 00:02:03,070 backtrack from C back to D, we retreat to D. Now there's still another outgoing edge 29 00:02:03,070 --> 00:02:08,021 from D to explore, namely the one to B. And so what happens is, we actually wind 30 00:02:08,021 --> 00:02:12,039 up wrapping all the way around this outer cycle. And we hit B sixth. And now, of 31 00:02:12,039 --> 00:02:16,097 course, anywhere we try to explore, we see somewhere we've already been, so from B we 32 00:02:16,097 --> 00:02:20,097 try to go to S, but we've been there, so we retreat to B, we try, we can try to go 33 00:02:20,097 --> 00:02:24,072 to A, but we've been there, so we retreat to B. Now we've explored all of the 34 00:02:24,072 --> 00:02:28,063 options out of B. So we have to retreat from B, we have to go back to D. Now from 35 00:02:28,065 --> 00:02:32,085 D we've explored both B and C, so we have to retreat back to E. E we've explored the 36 00:02:32,085 --> 00:02:37,019 only outgoing mark D, so we have to retreat to C. C we retreat to A. From A, 37 00:02:37,026 --> 00:02:42,093 we actually haven't yet looked along this arc. But that just sends to B where we've 38 00:02:42,093 --> 00:02:47,024 been before. So then we retreat back to A. Finally, we retreat back to S and S, even 39 00:02:47,024 --> 00:02:51,029 at S there's still an extra edge to explore. At S we say, oh we haven't tried 40 00:02:51,029 --> 00:02:55,066 this S-B edge yet but, of course, when we look across, we get to B where we've been 41 00:02:55,066 --> 00:02:59,098 before and then we backtrack to S. Then we've looked at every edge once and so we 42 00:02:59,098 --> 00:03:04,063 stop. That's how depth-first search works. You just pursue your path. You go to an 43 00:03:04,063 --> 00:03:08,050 immediate neighbor as long as you can until you hit somewhere you've been before 44 00:03:08,050 --> 00:03:12,020 and then you retreat. >>So you might be wondering you know why bother 45 00:03:12,020 --> 00:03:15,506 with another graph search strategy. After all we have breadth-first search which seems 46 00:03:15,506 --> 00:03:19,881 pretty awesome right? It runs in linear time. It's guaranteed to find everything 47 00:03:19,881 --> 00:03:24,933 you might wanna find. It computes shortest paths. It computes connected components if 48 00:03:24,933 --> 00:03:29,041 you imbed it in a for loop. Kinda seems like what else would you want? Well, it turns 49 00:03:29,041 --> 00:03:32,832 out depth-first search is gonna have its own impressive catalog of applications which 50 00:03:32,832 --> 00:03:36,241 you can't necessarily replicate with breadth-first search. And I'm gonna focus on 51 00:03:36,241 --> 00:03:40,100 applications in directed graphs. So there's gonna be a simple one that we 52 00:03:40,100 --> 00:03:43,351 discuss in this video. And then there's gonna be a more complicated one that has a 53 00:03:43,351 --> 00:03:47,520 separate video devoted to it. So in this video, we're gonna be discussing computing 54 00:03:47,520 --> 00:03:51,410 topological orderings of directed acyclic graphs. That is, directed graphs that have 55 00:03:51,410 --> 00:03:56,528 no directed cycle. The more complicated application is computing strongly 56 00:03:56,528 --> 00:04:00,253 connected components in directed graphs. The run time will be essentially the same 57 00:04:00,253 --> 00:04:04,583 as it was for breadth-first search and the best we could hope for which is linear 58 00:04:04,583 --> 00:04:09,253 time. And again, we're not assuming that there's necessarily that many edges. There 59 00:04:09,253 --> 00:04:13,797 may be much fewer edges than vertices. So, linear time in these connectivity 60 00:04:13,797 --> 00:04:19,271 applications means O(M+N). >>So let's now talk about the actual code of 61 00:04:19,271 --> 00:04:23,229 depth-first search. There's a couple ways to do it. One way to do it is to just make 62 00:04:23,229 --> 00:04:27,065 some minor modifications to the code for breadth-first search. The primary 63 00:04:27,065 --> 00:04:31,194 difference being instead of using a queue at its first-in-first-out behavior, you 64 00:04:31,194 --> 00:04:35,321 swap in a stack where it's last-in-first-out behavior. Again, if you don't know 65 00:04:35,321 --> 00:04:38,008 what a stack is you should read about that in the programming textbook or on the web. 66 00:04:38,008 --> 00:04:42,902 It's something that supports constant-time insertions to the front and constant 67 00:04:42,902 --> 00:04:47,891 time deletions from the front, unlike a queue which is meant to support constant-time 68 00:04:47,891 --> 00:04:51,175 deletions to the back. Okay, so stack. It operates just like those 69 00:04:51,175 --> 00:04:54,979 cafeteria trays that you know where you put in a tray, and the last one that got 70 00:04:54,979 --> 00:04:58,330 pushed in, when you take the first one out, that's the last one that got put in. 71 00:04:58,330 --> 00:05:02,103 So these are called push and pop. In a stack context, both are constant time. So 72 00:05:02,103 --> 00:05:05,152 if you swap out the queue, you swap in the stack, make a couple other minor 73 00:05:05,152 --> 00:05:10,912 modifications, breadth-first search turns into depth-first search. For the sake of 74 00:05:10,912 --> 00:05:15,412 both variety and elegance, I'm, instead, gonna show you a recursive version. So 75 00:05:15,412 --> 00:05:18,960 depth-first search is very naturally phrased as a recursive algorithm, and 76 00:05:18,960 --> 00:05:22,769 that's why we'll discuss here. So depth-first search, of course, takes as input 77 00:05:22,769 --> 00:05:26,521 a graph G. And, again, it could be undirected or directed, it doesn't matter. 78 00:05:26,521 --> 00:05:29,210 Just, with the directed graph, be sure that you only follow arcs in the 79 00:05:29,210 --> 00:05:32,034 appropriate direction, which should be automatically handled in the adjacency 80 00:05:32,034 --> 00:05:36,579 lists of your graph data structure anyways. So, as always, we keep 81 00:05:36,579 --> 00:05:39,533 a boolean local to each vertex of the graph, remembering whether we've, 82 00:05:39,533 --> 00:05:43,848 start, we've been there before or not. And of course as soon as we start 83 00:05:43,848 --> 00:05:47,771 exploring for S, we'd better make a note that, now we have been there. We better 84 00:05:47,771 --> 00:05:52,280 plant a flag, as it were. And remember, that depth-first search is an aggressive search. 85 00:05:52,280 --> 00:05:56,104 So we immediately try to recursively search from any of S's neighbors that we 86 00:05:56,104 --> 00:06:01,544 haven't already been to. And if we find such a vertex, if we find, somewhere we've 87 00:06:01,544 --> 00:06:07,133 never been, we recursively call depth-first search, from that node. The basic 88 00:06:07,133 --> 00:06:10,372 guarantees of depth-first search are exactly the same as they were for 89 00:06:10,372 --> 00:06:14,089 breadth-first search. We find everything we could possibly hope to find and we do it 90 00:06:14,089 --> 00:06:17,858 in linear time. And once again the reason is this is simply a special case of the 91 00:06:17,858 --> 00:06:21,750 generic search of procedure that we started this sequence of videos about. 92 00:06:21,750 --> 00:06:24,563 So it just corresponds to a particular way of choosing amongst 93 00:06:24,563 --> 00:06:28,691 multiple crossing edges between the region of explored nodes and the region of 94 00:06:28,691 --> 00:06:32,269 unexplored nodes, essentially always being biased toward the most recently discovered 95 00:06:32,269 --> 00:06:35,880 explored nodes. And just like breadth-first search, the running time is going to 96 00:06:35,880 --> 00:06:40,616 be proportional to the size of the component that you're discovering. And the 97 00:06:40,616 --> 00:06:45,982 basic reason is that each node is looked at only once, right? This boolean makes 98 00:06:45,982 --> 00:06:49,515 sure that we don't ever explore a node more than once and then for each edge, we 99 00:06:49,515 --> 00:06:54,723 look at it at most twice, one from each end point. And given that these exact same two 100 00:06:54,723 --> 00:06:58,526 claims hold for depth-first search as for breadth-first search, that means if we 101 00:06:58,526 --> 00:07:02,408 wanted to compute connected components in an undirected graph, we could equally well 102 00:07:02,408 --> 00:07:06,202 use an outer for loop with depth-first search as our work horse in the inner 103 00:07:06,202 --> 00:07:09,565 loop. It wouldn't matter. Either of those for undirected graphs, depth-first search, 104 00:07:09,565 --> 00:07:14,064 breadth-first search is gonna find all the connected components in O(M+N) time, 105 00:07:14,064 --> 00:07:18,564 in linear time. So, instead, I wanna look at an application particular to depth-first 106 00:07:18,564 --> 00:07:24,446 search, and this is about finding a topological ordering of a directed acyclic graph.