1 00:00:00,220 --> 00:00:04,460 Let's explore our second strategy for graph search, namely depth-first search. 2 00:00:04,460 --> 00:00:08,355 And again, like with breadth-first search, I'll open by just reminding you what 3 00:00:08,355 --> 00:00:12,930 depth-first search is good for and we'll trace through it in a particular example, 4 00:00:12,930 --> 00:00:14,716 and then we'll tell you what the actual code is. 5 00:00:14,716 --> 00:00:19,840 So if breadth-first search is the cautious and tentative exploration strategy, 6 00:00:19,840 --> 00:00:24,810 then depth-first search or DFS for short is its more aggressive cousin. 7 00:00:24,810 --> 00:00:29,810 So the plan is to explore aggressively and only back track when necessary. 8 00:00:29,810 --> 00:00:34,560 And this is very much the strategy one often uses when trying to solve a maze. 9 00:00:34,560 --> 00:00:37,530 To explain what I mean let me show you how this would 10 00:00:37,530 --> 00:00:41,850 work in the same running example we used when we discussed breadth-first search. 11 00:00:41,850 --> 00:00:44,230 So here if we invoke depth-first search from the node number S, 12 00:00:44,230 --> 00:00:46,680 here's what's going to happen. 13 00:00:46,680 --> 00:00:50,500 So obviously we start at S and obviously there's two places where we can go next. 14 00:00:50,500 --> 00:00:52,080 We can go to A or to B. 15 00:00:52,080 --> 00:00:55,120 And depth-first search is under determine like breadth-first search we can 16 00:00:55,120 --> 00:00:55,900 pick either one. 17 00:00:55,900 --> 00:00:59,005 Select with a breadth-first search example let's go to A first. 18 00:00:59,005 --> 00:01:02,520 So A will be the second one that we explore. 19 00:01:03,735 --> 00:01:07,760 But now, unlike breadth first search where we automatically went to node B next, 20 00:01:07,760 --> 00:01:10,340 since that was the other layer one node. 21 00:01:10,340 --> 00:01:16,040 Here, the only rule is that we have to go next to one of A's immediate neighbors. 22 00:01:16,040 --> 00:01:19,240 We might go to B, but we're not going to B because it is one of the neighbor's of S, 23 00:01:19,240 --> 00:01:21,340 we go because it is one of the neighbors of A. 24 00:01:21,340 --> 00:01:24,775 And actually to make sure the difference is clear let's assume that we aggressively 25 00:01:24,775 --> 00:01:29,408 pursue deeper when we go from A to C and now the depth first search strategy is 26 00:01:29,408 --> 00:01:33,070 again to just pursue deeper, so you go to one of C's immediate neighbors, so 27 00:01:33,070 --> 00:01:37,420 maybe we go to E next, so E is going to be the fourth one visited. 28 00:01:37,420 --> 00:01:41,510 Now from E there's only one neighbor not counting the one that we came in on so 29 00:01:41,510 --> 00:01:43,050 from E we go to D. 30 00:01:43,050 --> 00:01:44,910 And D is the fifth one we see. 31 00:01:44,910 --> 00:01:48,130 Now from D we have a choice, we can either go to B or we could go to C. 32 00:01:48,130 --> 00:01:51,130 So let's suppose we go to C from D. 33 00:01:51,130 --> 00:01:54,190 Well then we get to a node number three where we've been before. 34 00:01:54,190 --> 00:01:56,980 And as usual we're going to keep track of where we've already been. 35 00:01:56,980 --> 00:02:00,030 So at this point we have to back track from C back to D. 36 00:02:00,030 --> 00:02:01,490 We retreat to D. 37 00:02:01,490 --> 00:02:04,633 Now, there's still another outgoing edge from D to explore and 38 00:02:04,633 --> 00:02:06,115 then they'll be the one to B. 39 00:02:06,115 --> 00:02:09,690 And so what happens is we actually wind up wrapping all the way around this outer 40 00:02:09,690 --> 00:02:13,090 cycle and we could B sixth. 41 00:02:13,090 --> 00:02:13,700 And now, of course, 42 00:02:13,700 --> 00:02:15,900 anywhere we try to explore we see somewhere we've already been. 43 00:02:15,900 --> 00:02:19,340 So, from B we try to go to S, but we've been there so we retreat to B. 44 00:02:19,340 --> 00:02:22,520 We can try to go to A but we've been there so we retreat to B. 45 00:02:22,520 --> 00:02:25,070 Now we've explored all of the options out of B. 46 00:02:25,070 --> 00:02:27,710 So we have to retreat from B, we have to go back to D. 47 00:02:27,710 --> 00:02:31,960 Now from D we've explored both B and C so we have to retreat back to E. 48 00:02:31,960 --> 00:02:35,710 We've explored the only outgoing arc D, so we have to retreat to C. 49 00:02:35,710 --> 00:02:37,010 We retreat to A. 50 00:02:37,010 --> 00:02:41,430 From A we actually haven't yet looked along this arc, but 51 00:02:41,430 --> 00:02:43,740 that just sends us to B where we have been before. 52 00:02:43,740 --> 00:02:45,410 So then we retreat back to A. 53 00:02:45,410 --> 00:02:47,990 Finally we retreat back to S and S, 54 00:02:47,990 --> 00:02:51,060 even at S there's still an extra edge to explore. 55 00:02:51,060 --> 00:02:54,200 At S we say, we haven't tried this as B-edge yet. 56 00:02:54,200 --> 00:02:57,151 But of course, when we look across we get the B where we've been before and 57 00:02:57,151 --> 00:02:58,112 then we backtrack to S. 58 00:02:58,112 --> 00:03:01,177 Then we've looked at every edge once, and so we stop. 59 00:03:01,177 --> 00:03:02,950 So that's how depth-first search works. 60 00:03:02,950 --> 00:03:05,590 You just pursue your path, you go to an immediate 61 00:03:05,590 --> 00:03:08,540 neighbor as long as you can until you hit somewhere you've been before. 62 00:03:08,540 --> 00:03:10,050 And then you retreat. 63 00:03:10,050 --> 00:03:13,737 So you might be wondering why bother with another graph search strategy? 64 00:03:13,737 --> 00:03:16,442 After all we have breadth-first search, which seemed pretty awesome, right. 65 00:03:16,442 --> 00:03:18,140 It runs in linear time. 66 00:03:18,140 --> 00:03:22,600 It's guaranteed to find everything you might want to find, it computes shortest 67 00:03:22,600 --> 00:03:26,286 paths, it computes connected components if you embed it in a foreloop. 68 00:03:26,286 --> 00:03:28,533 It kind of seems like, what else would you want? 69 00:03:28,533 --> 00:03:29,225 Well, it turns out, 70 00:03:29,225 --> 00:03:32,520 depth-first search is going to have its own impressive catalogue of applications, 71 00:03:32,520 --> 00:03:35,190 which you can't necessarily replicate with breadth-first search. 72 00:03:35,190 --> 00:03:38,090 And I'm going to focus on applications in directed graphs. 73 00:03:38,090 --> 00:03:41,720 So there's going to be a simple one that we discuss in this video, and then there's 74 00:03:41,720 --> 00:03:45,140 going to be a more complicated one that has a separate video devoted to it. 75 00:03:45,140 --> 00:03:46,895 So in this video we're going to be discussing, 76 00:03:46,895 --> 00:03:50,330 computing topological orderings of directed acyclic graphs. 77 00:03:50,330 --> 00:03:53,590 That is, directed graphs that have no directed cycle. 78 00:03:53,590 --> 00:03:57,250 The more complicated application is computing strongly connected components in 79 00:03:57,250 --> 00:03:58,550 directed graphs. 80 00:03:58,550 --> 00:04:01,015 The run time will be, essentially, the same as it was for 81 00:04:01,015 --> 00:04:05,200 breadth-first search, and the best we could hope for, which is linear time. 82 00:04:05,200 --> 00:04:08,683 And again, we're not assuming that there's necessarily that many edges. 83 00:04:08,683 --> 00:04:11,108 There may be much fewer edges than vertices. 84 00:04:11,108 --> 00:04:16,020 So linear time and these connectivity applications means O of M plus n. 85 00:04:17,500 --> 00:04:19,630 So let's not talk about the actual code of depth for search. 86 00:04:19,630 --> 00:04:21,350 There's a couple of ways to do it. 87 00:04:21,350 --> 00:04:25,075 One way to do it is to just make some minor modifications to the code for 88 00:04:25,075 --> 00:04:26,430 breadth for a search. 89 00:04:26,430 --> 00:04:29,690 The primary difference being instead of using a queue in its 90 00:04:29,690 --> 00:04:34,640 first in first out behavior, you swap in a stack with its last in first out behavior. 91 00:04:34,640 --> 00:04:37,150 Again, if you don't know what a stack is you should read about that in the program 92 00:04:37,150 --> 00:04:38,880 textbook or on the web. 93 00:04:38,880 --> 00:04:42,520 It's something that supports constant time insertions to the front and 94 00:04:42,520 --> 00:04:45,260 constant time deletions from the front, 95 00:04:45,260 --> 00:04:48,990 unlike a queue which is meant to support constant time deletions to the back. 96 00:04:48,990 --> 00:04:52,750 Okay so stack that operates just like those cafeteria trays that you know where 97 00:04:52,750 --> 00:04:55,780 you put in a tray and the last one that got pushed in when 98 00:04:55,780 --> 00:04:58,120 you take the first one out that's the last one that got put in. 99 00:04:58,120 --> 00:05:01,840 So these are called push and pop, in a stack context both are constant time. 100 00:05:01,840 --> 00:05:04,287 So if you swap out the queue you swap in the stack, 101 00:05:04,287 --> 00:05:06,447 make a couple other minor modifications. 102 00:05:06,447 --> 00:05:10,152 Breadth-first search turns into depth-first search. 103 00:05:10,152 --> 00:05:12,321 For the sake of both variety and elegance, 104 00:05:12,321 --> 00:05:15,040 I'm instead going to show you a recursive version. 105 00:05:15,040 --> 00:05:18,680 So depth-first search is very naturally phrased as a recursive algorithm, 106 00:05:18,680 --> 00:05:20,140 and that's what we'll discuss here. 107 00:05:20,140 --> 00:05:22,190 So, depth-first search, of course, 108 00:05:22,190 --> 00:05:25,750 takes as input a graph g and again it could be undirected or directed. 109 00:05:25,750 --> 00:05:29,060 It doesn't matter, just with a directed graph be sure that you only follow arcs 110 00:05:29,060 --> 00:05:32,040 in the appropriate direction, which should be automatically handled 111 00:05:32,040 --> 00:05:35,440 in the adjacency lists of your graph data structure anyways. 112 00:05:35,440 --> 00:05:38,730 So as always we keep a Boolean local to each vertex of the graph, 113 00:05:38,730 --> 00:05:42,680 remembering whether we've been there before or not. 114 00:05:42,680 --> 00:05:45,650 And of course, as soon as we start exploring from S we better make a note 115 00:05:45,650 --> 00:05:47,160 that now we have been there. 116 00:05:47,160 --> 00:05:48,716 We better plant a flag, as it were. 117 00:05:48,716 --> 00:05:53,180 And remember, depth-first search is an aggressive search, so we immediately 118 00:05:53,180 --> 00:05:57,260 try to recursively search from any of S's neighbors that we haven't already been to. 119 00:05:57,260 --> 00:06:02,160 And now, if we find such a vertex, if we find somewhere we've never been, 120 00:06:02,160 --> 00:06:05,500 we recursively call depth-first search from that node. 121 00:06:06,510 --> 00:06:10,072 The basic guarantees of depth-first search are exactly the same as they were for 122 00:06:10,072 --> 00:06:10,890 breath-first search. 123 00:06:10,890 --> 00:06:15,210 We find everything we could possibly hope to find and we do it in linear time. 124 00:06:15,210 --> 00:06:18,410 And once again, the reason is this is simply a special case of the generic 125 00:06:18,410 --> 00:06:21,930 stretch procedure that we started this sequence of videos about. 126 00:06:21,930 --> 00:06:25,530 So it just corresponds to a particular way of choosing amongst multiple crossing 127 00:06:25,530 --> 00:06:28,250 edges between the region of explored nodes, and 128 00:06:28,250 --> 00:06:29,880 between the region of unexplored nodes. 129 00:06:29,880 --> 00:06:33,920 Essentially always being biased toward the most recently discovered explored nodes. 130 00:06:33,920 --> 00:06:37,345 And just like, breadth for search, the running time is going to be proportional 131 00:06:37,345 --> 00:06:40,945 to the size of the component that you're discovering. 132 00:06:40,945 --> 00:06:44,395 And the basic reason is that each node is looked at only once, right? 133 00:06:44,395 --> 00:06:48,475 This boolean makes sure that we don't ever explore a node more than once, and 134 00:06:48,475 --> 00:06:52,055 then for each edge, we look at it at most twice, once from each endpoint. 135 00:06:53,700 --> 00:06:56,715 And given that these exact same two claims hold for depth-first search as for 136 00:06:56,715 --> 00:07:00,040 breadth-first search, that means if we wanted to compute connected components in 137 00:07:00,040 --> 00:07:03,180 an undirected graph, we could equally well use an outer for 138 00:07:03,180 --> 00:07:06,420 loop with depth-first search as our workhorse in the inner loop. 139 00:07:06,420 --> 00:07:07,290 It wouldn't matter. 140 00:07:07,290 --> 00:07:10,570 Either of those for undirected graphs, depth-first search, breadth-first search, 141 00:07:10,570 --> 00:07:14,290 is going to find all the connected components in O of n plus m time, 142 00:07:14,290 --> 00:07:15,390 in linear time. 143 00:07:15,390 --> 00:07:18,370 So instead, I want to focus on an application in particular to depth-first 144 00:07:18,370 --> 00:07:19,220 search, and 145 00:07:19,220 --> 00:07:23,250 this is about finding a topological ordering of a directed acyclic graph.