1 00:00:00,012 --> 00:00:03,620 Having mastered computing the connecting components of an undirected graph in 2 00:00:03,620 --> 00:00:06,740 linear time, let's now turn our attention to directed graphs, 3 00:00:06,740 --> 00:00:09,780 which also arise in all kinds of different applications. 4 00:00:09,780 --> 00:00:13,340 Now the good news is, is we'll be able to get just a blazingly fast primitive for 5 00:00:13,340 --> 00:00:15,940 computing connectivity information for directed graphs. 6 00:00:15,940 --> 00:00:17,600 The bad news, if you want to call it hat, 7 00:00:17,600 --> 00:00:19,300 is we'll have to think a little bit harder. 8 00:00:19,300 --> 00:00:21,150 So it won't be as obvious how to do it. 9 00:00:21,150 --> 00:00:24,499 But by the end of this lecture will you know linear time algorithms, 10 00:00:24,499 --> 00:00:25,572 a very good concept. 11 00:00:25,572 --> 00:00:28,050 It's really just based on depth first search for 12 00:00:28,050 --> 00:00:30,930 computing all of the pieces of a directed graph. 13 00:00:30,930 --> 00:00:33,480 In fact, it's not even so obvious how to define pieces, 14 00:00:33,480 --> 00:00:35,790 how to define connected components in a directed graph. 15 00:00:35,790 --> 00:00:39,004 Certainly not as obvious as it was with undirected graphs. 16 00:00:39,004 --> 00:00:42,940 So, see what I mean, let's consider the following four node directed graph. 17 00:00:45,070 --> 00:00:48,805 So on the one hand, this graph in some sense in one piece. 18 00:00:48,805 --> 00:00:52,333 If this was an actual physical object, made say of a bunch of strings connected 19 00:00:52,333 --> 00:00:54,471 to each other and we picked it up with our hands, 20 00:00:54,471 --> 00:00:58,007 it wouldn't fall apart into two pieces, it would hang together in one piece. 21 00:00:58,007 --> 00:01:00,995 On the other hand, when you think about moving around this network, 22 00:01:00,995 --> 00:01:03,880 it's not connected in the sense that we might think about. 23 00:01:03,880 --> 00:01:07,280 You cannot get from any one point a, to any other point b. 24 00:01:07,280 --> 00:01:09,990 For example, if you started the right-most node in this graph, 25 00:01:09,990 --> 00:01:13,540 certainly there's no directed path that will get you to the left-most node. 26 00:01:13,540 --> 00:01:14,980 So what's typically studied and 27 00:01:14,980 --> 00:01:18,660 most useful for directed graphs is what you call strong connectivity. 28 00:01:18,660 --> 00:01:22,687 And a graph is strongly connected if you can get from any one point to any 29 00:01:22,687 --> 00:01:24,334 other point and vice versa. 30 00:01:24,334 --> 00:01:28,671 And the components then informally are the maximal portions of this graph, 31 00:01:28,671 --> 00:01:32,480 the maximal regions, which are internally strongly connected. 32 00:01:32,480 --> 00:01:36,086 So the maximum regions from within which you can get from any one 33 00:01:36,086 --> 00:01:39,158 ,point a to any other point b along a directed graph. 34 00:01:39,158 --> 00:01:42,539 For the record, let me go ahead and give you a formal definition, 35 00:01:42,539 --> 00:01:44,950 although this intuition is perfectly valid. 36 00:01:44,950 --> 00:01:47,560 Just regions where you can get from anywhere to anywhere else. 37 00:01:49,970 --> 00:01:53,120 So we say that the strongly connected components of directed graph, or 38 00:01:53,120 --> 00:01:54,600 the SCCs for short. 39 00:01:55,910 --> 00:01:59,129 And as in the undirected case, we're going to give a somewhat slick definition 40 00:01:59,129 --> 00:02:02,061 rather than talking about maximal region satisfying some property. 41 00:02:02,061 --> 00:02:04,424 We're going to talk about the equivalence classes of a particular 42 00:02:04,424 --> 00:02:05,490 equivalence relation. 43 00:02:05,490 --> 00:02:06,860 But really it means the same thing. 44 00:02:06,860 --> 00:02:10,681 This is just sort of the more mathematically more mature way of writing 45 00:02:10,681 --> 00:02:11,206 it down. 46 00:02:11,206 --> 00:02:15,597 So the equivalence relation we're going to use, if it's on nodes of the graph, and 47 00:02:15,597 --> 00:02:19,223 we'll say that u and node is related to a node v if you can get from u to v 48 00:02:19,223 --> 00:02:22,880 via directed path, and also from v to u in some other directed path. 49 00:02:26,150 --> 00:02:28,240 I'm not going to bore you with the verification that this is 50 00:02:28,240 --> 00:02:29,040 an equivalence relation. 51 00:02:29,040 --> 00:02:31,000 That's something you can work out in the privacy of your home. 52 00:02:32,600 --> 00:02:35,290 So remember what it means to be an equivalence relation, that's reflexive, 53 00:02:35,290 --> 00:02:37,520 that is everybody's related to itself. 54 00:02:37,520 --> 00:02:40,190 But of course there is a path from every node to itself. 55 00:02:40,190 --> 00:02:44,494 It also means that it's symmetric, so if u is related to v, then v is related to u. 56 00:02:44,494 --> 00:02:48,484 Well again, by definition we're saying that the vertices are mutually reachable 57 00:02:48,484 --> 00:02:49,430 from each other. 58 00:02:49,430 --> 00:02:51,055 So that's clearly symmetric by definition. 59 00:02:51,055 --> 00:02:52,480 And then it has to be transitive. 60 00:02:52,480 --> 00:02:55,840 And the way you prove it's transitive is you just paste paths together, and 61 00:02:55,840 --> 00:02:58,170 it just works the way you'd expect it to. 62 00:02:58,170 --> 00:03:01,920 So let's illustrate this concretely with a somewhat more complicated directed graph. 63 00:03:03,310 --> 00:03:04,934 So here's a directed graph, and 64 00:03:04,934 --> 00:03:08,620 I claim that it has exactly four strongly connecting components. 65 00:03:08,620 --> 00:03:10,040 There's a triangle on the left. 66 00:03:11,280 --> 00:03:12,392 A triangle on the right. 67 00:03:12,392 --> 00:03:15,439 It has this single node on top. 68 00:03:15,439 --> 00:03:19,320 And then it has this directed forecycle with a diagonal at the bottom. 69 00:03:21,030 --> 00:03:24,540 So what I hope is pretty clear is that each of these circled regions is indeed 70 00:03:24,540 --> 00:03:25,390 strongly connected. 71 00:03:25,390 --> 00:03:29,050 That is, if you start from one node in one of these circled regions, 72 00:03:29,050 --> 00:03:31,610 you can have a directed path to any other node. 73 00:03:31,610 --> 00:03:34,460 So that's symmetric, because on a directed cycle you can get from any 74 00:03:34,460 --> 00:03:36,320 one starting point to anyother place. 75 00:03:36,320 --> 00:03:37,750 And all of these have directed cycles. 76 00:03:37,750 --> 00:03:40,240 And then there's the one strong component that just has the one node, 77 00:03:40,240 --> 00:03:42,130 which is obviously strongly connected. 78 00:03:42,130 --> 00:03:44,880 What I also claim is true is that all of these regions are maximal. 79 00:03:44,880 --> 00:03:46,677 Subject to being strongly connected. 80 00:03:46,677 --> 00:03:48,160 That's why they're the strongly connected components. 81 00:03:48,160 --> 00:03:51,060 That's why they're the equivalence classes of this equivalence relation 82 00:03:51,060 --> 00:03:52,260 we just defined. 83 00:03:52,260 --> 00:03:55,760 So if you take any two pairs of nodes which lie in two different circles, 84 00:03:55,760 --> 00:03:58,470 you either won't have a path from the first one to the second, or 85 00:03:58,470 --> 00:04:01,390 you won't have a directed path from the second one back to the first one. 86 00:04:01,390 --> 00:04:05,780 In fact, the structure of the strong components in this black graph 87 00:04:05,780 --> 00:04:10,440 exactly mirrors the directed acyclic graph that we started in red. 88 00:04:10,440 --> 00:04:14,580 So in the same way in the red four-node graph, you can't move from right to left. 89 00:04:14,580 --> 00:04:18,530 Here in this bigger graph, you can't move from any of the circled SCCs to the right 90 00:04:18,530 --> 00:04:20,374 to any of the circled SCCs to the left. 91 00:04:20,374 --> 00:04:22,519 So for example, from the right-most nodes, 92 00:04:22,519 --> 00:04:25,000 there are no directed paths to the left-most nodes. 93 00:04:26,600 --> 00:04:29,640 So that's a recap of the definition of the strongly connected components. 94 00:04:29,640 --> 00:04:33,083 I've motivated in a separate video some reasons why you might care about computing 95 00:04:33,083 --> 00:04:34,516 strongly connected components. 96 00:04:34,516 --> 00:04:35,537 And in particular, 97 00:04:35,537 --> 00:04:39,861 on extremely large graphs which motivates the need for blazingly fast sub routines, 98 00:04:39,861 --> 00:04:43,840 so four free primitives that will let you compute connectivity information. 99 00:04:43,840 --> 00:04:46,846 So you'd love to be able to just know the pieces of a directed graph. 100 00:04:46,846 --> 00:04:48,958 Maybe you don't even know why they're going to be useful but 101 00:04:48,958 --> 00:04:50,600 you just compute them because why not? 102 00:04:50,600 --> 00:04:51,498 It's four free primitive. 103 00:04:51,498 --> 00:04:53,455 So, that's what I'm going to give you on this lecture. 104 00:04:53,455 --> 00:04:57,390 So the algorithm that we're going to present is based on depth-first search. 105 00:04:57,390 --> 00:05:00,686 And that's going to be the main reason why it's going to be so blazingly fast, 106 00:05:00,686 --> 00:05:03,440 is because depth-first search is blazingly fast. 107 00:05:03,440 --> 00:05:05,230 Now, you might be wondering, 108 00:05:05,230 --> 00:05:09,490 what on earth does graph search have to do with computing components? 109 00:05:09,490 --> 00:05:11,640 They don't seem obviously related. 110 00:05:11,640 --> 00:05:15,470 So let's return to the same directed graph that I shows you on the previous slide. 111 00:05:17,310 --> 00:05:22,040 And to see why something like depth-first search might conceivably have some use for 112 00:05:22,040 --> 00:05:23,610 computing strong components. 113 00:05:23,610 --> 00:05:28,295 Suppose we called depth-first search starting from this red node as 114 00:05:28,295 --> 00:05:29,594 a starting point. 115 00:05:29,594 --> 00:05:30,970 What would it would explore? 116 00:05:30,970 --> 00:05:34,215 So remember what the guarantee of something like depth-first search or 117 00:05:34,215 --> 00:05:36,032 breath first search for that matter is. 118 00:05:36,032 --> 00:05:40,020 You find everything that findable, but naturally nothing else. 119 00:05:40,020 --> 00:05:41,730 So what is findable from this red node? 120 00:05:41,730 --> 00:05:42,800 Where by findable I mean, 121 00:05:42,800 --> 00:05:46,620 you can reach it from a directed path emanating from this red node. 122 00:05:46,620 --> 00:05:48,105 Well, there's not much you can do. 123 00:05:48,105 --> 00:05:50,315 So from here you can explore this arc. 124 00:05:50,315 --> 00:05:51,634 And you can explore this arc. 125 00:05:51,634 --> 00:05:53,246 And then you can go backward. 126 00:05:53,246 --> 00:05:55,825 And so, if you do DFS or BFS from this node, 127 00:05:55,825 --> 00:05:59,520 you're going to find precisely the nodes in this triangle. 128 00:05:59,520 --> 00:06:02,240 All of the other arcs involved go the wrong direction and 129 00:06:02,240 --> 00:06:06,050 they won't be traversed by, say, a depth-first search call. 130 00:06:06,050 --> 00:06:07,620 So, why is that interesting? 131 00:06:07,620 --> 00:06:11,877 What's interesting is that if we invoke DFS from this red node, or 132 00:06:11,877 --> 00:06:17,425 any of the three nodes from this triangle, then it's going to discover precisely this 133 00:06:17,425 --> 00:06:22,612 strongly connected component, precisely the three nodes in this circled SCC. 134 00:06:22,612 --> 00:06:25,590 So that seems really cool, seems like maybe we just do a DFS, and 135 00:06:25,590 --> 00:06:27,010 boom we get an SCC. 136 00:06:27,010 --> 00:06:30,770 So maybe if we can do that over and over again we'll get all the SCCs. 137 00:06:30,770 --> 00:06:34,330 So that's a good initial intuition, but something can go wrong. 138 00:06:34,330 --> 00:06:38,570 Suppose that instead of initiating DFS from one of these three nodes on 139 00:06:38,570 --> 00:06:43,220 the triangle, we say, initiated from this bottom most node in green. 140 00:06:44,310 --> 00:06:47,960 So remember, what is the guarantee of a graph search subroutine like DFS? 141 00:06:47,960 --> 00:06:51,680 It will find everything findable but of course, nothing more. 142 00:06:51,680 --> 00:06:54,550 So what's findable from this green node? 143 00:06:54,550 --> 00:06:58,544 Well, naturally everything in its own SCC, right? 144 00:06:58,544 --> 00:07:01,700 So the four nodes here, it'll certainly find those four nodes. 145 00:07:01,700 --> 00:07:06,610 On the other hand, if we start from this green node, since there are arcs that 146 00:07:06,610 --> 00:07:11,530 go from this bottom-most SCC to the right-most the SCC. 147 00:07:11,530 --> 00:07:15,700 Not only will this DFS call find the four nodes in the green node's strong 148 00:07:15,700 --> 00:07:19,150 component, but it will also traverse these blue arcs and 149 00:07:19,150 --> 00:07:22,530 discover the three nodes in the red triangle. 150 00:07:22,530 --> 00:07:28,032 So, if we call DFS from this green node, we'll capture all seven of these. 151 00:07:28,032 --> 00:07:29,961 So the point is, if we call DFS, 152 00:07:29,961 --> 00:07:34,490 it looks like we're going to get a union of possibly multiple SCCs. 153 00:07:34,490 --> 00:07:36,300 In fact, in the worst case, 154 00:07:36,300 --> 00:07:41,140 if we invoke DFS from the leftmost node, what's it going to discover? 155 00:07:41,140 --> 00:07:44,160 It's going to discover the entire graph. 156 00:07:44,160 --> 00:07:48,584 And that didn't give us any insight into the strong component structure at all. 157 00:07:48,584 --> 00:07:53,342 So, what's the takeaway point is, the takeaway point is if you call 158 00:07:53,342 --> 00:07:57,866 DFS from just the right place, you'll actually uncover an SCC. 159 00:07:57,866 --> 00:08:02,222 If you call it from the wrong place, it will give you no information at all. 160 00:08:02,222 --> 00:08:06,898 So the magic of the algorithm that we're going to discuss next is we'll show having 161 00:08:06,898 --> 00:08:11,301 this super slick pre-processing step which ironically is itself is called 162 00:08:11,301 --> 00:08:13,090 a depth-first search. 163 00:08:13,090 --> 00:08:17,615 We can in linear time compute exactly where we want to start the subsequent 164 00:08:17,615 --> 00:08:20,910 depth-first searches from, so that each indication 165 00:08:20,910 --> 00:08:24,660 gets us exactly one strongly connected component and nothing more. 166 00:08:27,640 --> 00:08:30,580 So the algorithm that I'm going to show you is due to Kosaraju and 167 00:08:32,310 --> 00:08:37,120 it will show the following theorem, that the strongly connected components of 168 00:08:37,120 --> 00:08:39,430 a directed graph can be computed in linear time. 169 00:08:39,430 --> 00:08:42,090 And as we'll see, the constants are also very small. 170 00:08:42,090 --> 00:08:44,280 It's really just going to be two passes of depth first search. 171 00:08:45,330 --> 00:08:47,290 And again I'm going to remind you that for many problems, 172 00:08:47,290 --> 00:08:50,560 it's natural to assume that the number of edges is at least the number of nodes 173 00:08:50,560 --> 00:08:53,100 because you're only interested in cases where the graph is connected. 174 00:08:53,100 --> 00:08:56,400 Of course when you're computing connected components, that's one of the most natural 175 00:08:56,400 --> 00:08:59,540 cases where you might have a super sparse broken up graph. 176 00:08:59,540 --> 00:09:03,540 So we're not going to assume M is at least N so that's why linear time is going to 177 00:09:03,540 --> 00:09:05,519 be M plus N because that's the size of the input. 178 00:09:06,850 --> 00:09:09,800 And we don't know either m could be bigger than n or n could be bigger than m. 179 00:09:09,800 --> 00:09:10,300 We have no idea. 180 00:09:11,950 --> 00:09:16,160 Kosaraju's algorithm is shocking in its simplicity. 181 00:09:16,160 --> 00:09:17,370 It has three steps. 182 00:09:17,370 --> 00:09:18,240 Let me tell you what they are. 183 00:09:20,220 --> 00:09:23,090 First very mysteriously we are going to 184 00:09:23,090 --> 00:09:25,129 reverse all of the archs of the given graph. 185 00:09:26,680 --> 00:09:30,300 Totally not clear why that would be an interesting thing to do yet. 186 00:09:30,300 --> 00:09:33,790 Then we're going to do our first pass, our first depth first search, 187 00:09:33,790 --> 00:09:35,850 and we're going to do it on the reverse graph. 188 00:09:37,710 --> 00:09:40,570 Now the naive way to implement this would be to literally construct 189 00:09:40,570 --> 00:09:43,950 a new copy of the input graph with all the the arcs in the reverse direction, and 190 00:09:43,950 --> 00:09:46,150 then just run depth first search on it. 191 00:09:46,150 --> 00:09:49,040 Of course, the sophisticated, the sort of obvious optimization would be 192 00:09:49,040 --> 00:09:53,770 to just run DFS on the original graph, but going across arcs backwards. 193 00:09:53,770 --> 00:09:56,590 So I'll let you think through the details of how you'd do that, but that just works. 194 00:09:56,590 --> 00:10:00,120 You run DFS, and instead of going forward along edges, you go backward along edges, 195 00:10:00,120 --> 00:10:03,290 that simulates depth first search on the reverse graph. 196 00:10:03,290 --> 00:10:06,460 Now I've written here DFS loop and that just means the user will check more to 197 00:10:06,460 --> 00:10:10,500 make sure that you see all of the nodes of the graph even if it's disconnected 198 00:10:10,500 --> 00:10:14,390 you have an outer loop where you just try each starting point separately. 199 00:10:14,390 --> 00:10:17,640 If you haven't already seen it then you run DFS from that given node. 200 00:10:17,640 --> 00:10:18,970 I'll be more detailed on the next slide. 201 00:10:20,890 --> 00:10:23,774 And the third step is just you run depth-first search again, but 202 00:10:23,774 --> 00:10:25,253 this time on the original graph. 203 00:10:27,238 --> 00:10:30,951 Now at this point you should be thinking that I'm totally crazy, right, so 204 00:10:30,951 --> 00:10:32,180 what are we trying to do? 205 00:10:32,180 --> 00:10:34,360 We're trying to compute these strongly connected components. 206 00:10:34,360 --> 00:10:37,940 We're trying to actually compute real objects, these maximal regions and 207 00:10:37,940 --> 00:10:39,390 all I'm doing is searching the graph. 208 00:10:39,390 --> 00:10:40,170 I do it once forward. 209 00:10:40,170 --> 00:10:41,370 I do it once backward. 210 00:10:41,370 --> 00:10:44,010 I mean it doesn't seem like I'm computing anything. 211 00:10:44,010 --> 00:10:46,680 So here's the catch and it's a very minor catch. 212 00:10:46,680 --> 00:10:49,640 So we're going to get you to do a little bit of bookkeeping, it's going to be very 213 00:10:49,640 --> 00:10:52,500 little overhead so we'll still have a blazingly fast algorithm. 214 00:10:52,500 --> 00:10:55,350 So but with a little bit of bookkeeping, here's what's going to happen. 215 00:10:55,350 --> 00:10:57,690 The second depth first search. 216 00:10:57,690 --> 00:11:02,760 Which searches the graph, will in it's search process discover the components 217 00:11:02,760 --> 00:11:04,810 one at a time in a very natural way. 218 00:11:04,810 --> 00:11:07,610 And that will be really obvious when we do an example 219 00:11:07,610 --> 00:11:09,260 which we'll do in just a second. 220 00:11:09,260 --> 00:11:13,370 Now, for the second depth first search to work as magical way where it 221 00:11:13,370 --> 00:11:18,610 just discovers the connective component one at a time, it's really important that 222 00:11:18,610 --> 00:11:21,760 executes the depth first searches in a particular order, 223 00:11:21,760 --> 00:11:24,150 that it goes to the nodes of the graph in a particular order. 224 00:11:24,150 --> 00:11:27,650 And that is exactly the job of the first pass. 225 00:11:27,650 --> 00:11:32,100 The depth first search on reverse graph is going to compute an ordering of the nodes 226 00:11:32,100 --> 00:11:36,370 which, when the second depth first search goes through them in that order, 227 00:11:36,370 --> 00:11:40,060 it will just discover the SCCs one at a time. 228 00:11:40,060 --> 00:11:40,730 In linear time. 229 00:11:42,330 --> 00:11:45,630 So let me say a little bit more about the form of the bookkeeping and 230 00:11:45,630 --> 00:11:50,030 then I'll show you how that bookkeeping is kept in as we do depth-first search. 231 00:11:51,040 --> 00:11:53,500 So we're going to have a notion of a finishing time of a vertex. 232 00:11:53,500 --> 00:11:56,075 And that's going to be computed in the first pass when we do 233 00:11:56,075 --> 00:11:57,530 depth-first search in the reverse graph. 234 00:11:58,830 --> 00:12:01,780 And we're going to make use of this data in the second pass. 235 00:12:01,780 --> 00:12:05,610 So rather than just going through the nodes of the graph in an arbitrary order, 236 00:12:05,610 --> 00:12:09,140 like we usually do when we sort of have a loop to depth first search. 237 00:12:09,140 --> 00:12:12,520 We're going to make sure that we go through the vertices 238 00:12:12,520 --> 00:12:14,619 in decreasing order of these finishing times. 239 00:12:15,860 --> 00:12:19,825 Now there's still the question of what sense doe this second depth first search 240 00:12:19,825 --> 00:12:23,700 discover and report to the strong connected components that it finds? 241 00:12:23,700 --> 00:12:28,660 So we're going to label each node in the second pass with what we call a leader. 242 00:12:28,660 --> 00:12:32,060 And the idea is that the nodes in the same strong connected component will be 243 00:12:32,060 --> 00:12:33,770 labeled with exactly the same leader node. 244 00:12:33,770 --> 00:12:37,590 And again, all of this will be much more clear once we do a concrete example, but 245 00:12:37,590 --> 00:12:39,510 I want to have it down for the record right now. 246 00:12:41,470 --> 00:12:43,620 So that's the algorithm at a high level. 247 00:12:43,620 --> 00:12:46,560 It's really just two passes of DFS with some bookkeeping, but 248 00:12:46,560 --> 00:12:47,690 this is under specify. 249 00:12:47,690 --> 00:12:51,810 You really shouldn't understand how to implement the algorithm just yet. 250 00:12:51,810 --> 00:12:52,780 So what do I owe you? 251 00:12:52,780 --> 00:12:56,990 I owe you exactly what I mean by the DFS-loop, although this is seen more or 252 00:12:56,990 --> 00:12:57,660 less in the past. 253 00:12:57,660 --> 00:13:00,730 It's just a loop over all the vertices of the graph, and 254 00:13:00,730 --> 00:13:04,070 if you haven't seen something yet in DFS from that starting point, 255 00:13:04,070 --> 00:13:07,270 I need to tell you what finishing times are and how they get computed. 256 00:13:07,270 --> 00:13:08,430 They're just going to be integers 1 to n, 257 00:13:08,430 --> 00:13:12,460 which is basically when depth first search gets finished with one of the nodes, and 258 00:13:12,460 --> 00:13:15,100 then I need to tell you how you compute these leaders. 259 00:13:15,100 --> 00:13:18,080 So, let me tell you all three of those things on the next slot. 260 00:13:19,770 --> 00:13:23,380 So the work course for Kosaraju's strongly connected components algorithm 261 00:13:23,380 --> 00:13:27,780 is this DFS-loop subroutine, and it takes, as input, a graph. 262 00:13:27,780 --> 00:13:30,170 So it does not take as input a starting node, 263 00:13:30,170 --> 00:13:33,030 it's going to loop over possible starting nodes. 264 00:13:33,030 --> 00:13:35,220 Now for the bookkeeping to compute finishing nodes, 265 00:13:35,220 --> 00:13:37,700 we're going to keep track of a global variable of it all called T. 266 00:13:39,330 --> 00:13:40,710 Which we initialize to zero. 267 00:13:42,680 --> 00:13:46,180 The point of T is to count how many nodes we've totally 268 00:13:46,180 --> 00:13:47,840 finished exploring at this point. 269 00:13:49,130 --> 00:13:52,890 So this is the variable we use to compute those finishing times in the first pass, 270 00:13:52,890 --> 00:13:55,330 that magical ordering that I was talking about. 271 00:13:55,330 --> 00:13:58,240 Now we're also going to have a second global variable to compute these things I 272 00:13:58,240 --> 00:14:02,230 was calling leaders, and these are only relevant for the second pass. 273 00:14:02,230 --> 00:14:05,330 So what S is going to keep track of is the most recent 274 00:14:05,330 --> 00:14:08,630 vertex from which a DFS was initiated. 275 00:14:09,930 --> 00:14:13,090 So to keep the code simple, I'm just going to do all of the bookkeeping 276 00:14:13,090 --> 00:14:16,770 in the DFS-Loop, so really DFS-Loop gets called twice, 277 00:14:16,770 --> 00:14:19,500 once in a reverse graph, once in the forward graph. 278 00:14:19,500 --> 00:14:22,900 And we only need to compute these finishing times in the first pass 279 00:14:22,900 --> 00:14:24,210 on the reverse graph and 280 00:14:24,210 --> 00:14:27,770 we only need to compute these leaders on the second pass for the forward graph. 281 00:14:27,770 --> 00:14:30,320 But let's just keep them both in there just for kicks. 282 00:14:31,670 --> 00:14:33,420 Now we're going to need to loop through the vertices. 283 00:14:33,420 --> 00:14:36,750 And so the question is in what order are we going to loop through the vertices? 284 00:14:36,750 --> 00:14:39,250 And that's going to happen differently in the two different passes, but 285 00:14:39,250 --> 00:14:41,140 let me just use some common notation. 286 00:14:41,140 --> 00:14:42,030 Let's just assume, 287 00:14:42,030 --> 00:14:44,770 in this sub-routine, that the nodes are somehow labeled from 1 to n. 288 00:14:46,620 --> 00:14:50,370 In our first depth first search it's going to be labeled totally arbitrary, so 289 00:14:50,370 --> 00:14:54,130 these are basically just the names of the node or their position in the node array, 290 00:14:54,130 --> 00:14:56,290 whatever, you just do it in some arbitrary order. 291 00:14:56,290 --> 00:15:00,670 Now the second time we run DFS loop, as indicated on the previous slide, 292 00:15:00,670 --> 00:15:03,500 we're going to use the finishing times as the labeling. 293 00:15:03,500 --> 00:15:07,138 As we'll see, the finishing times are indeed numbers between 1 in it, so 294 00:15:07,138 --> 00:15:10,737 now what do we do is we just iterate through the nodes in decreasing order. 295 00:15:12,578 --> 00:15:16,363 And if we haven't already seen node i, then we initiate a DFS from it, so 296 00:15:16,363 --> 00:15:20,463 as usual we're going to be maintaining a local boolean to keep track of what we had 297 00:15:20,463 --> 00:15:22,969 already seen a node yet in one of the DFS passes. 298 00:15:24,480 --> 00:15:28,820 Now remember, the global variable s is responsible for keeping track of the most 299 00:15:28,820 --> 00:15:33,650 recent node from which Depth First Search had been initiated, so if i's not explored 300 00:15:33,650 --> 00:15:36,740 and we initiate a Depth First Search from it, we better reset s. 301 00:15:36,740 --> 00:15:43,530 And then we do the usual DFS ng starting from the source node i. 302 00:15:43,530 --> 00:15:47,850 So for completeness let me just remind you what the depth first search sub-routine 303 00:15:47,850 --> 00:15:50,620 looks like, so now we're given a graph and a starting node. 304 00:15:52,120 --> 00:15:56,706 So the first time we see a node we mark it as explored. 305 00:15:56,706 --> 00:16:00,833 And just a side note that once a node is marked explored, it's explored for 306 00:16:00,833 --> 00:16:02,876 this entire indication of DFS loop. 307 00:16:02,876 --> 00:16:06,888 Okay so even if this DFS from a given node i finishes, and then the outer for 308 00:16:06,888 --> 00:16:11,510 loop marches on, and encounters i again, it's still going to be marked as explored. 309 00:16:12,750 --> 00:16:17,070 Now one of our bookkeeping jobs is to keep track of from which 310 00:16:17,070 --> 00:16:20,620 vertex did the DFS that discovered i get called. 311 00:16:20,620 --> 00:16:23,100 So when i is first encountered, 312 00:16:23,100 --> 00:16:27,960 we remember that s was the node from which this DFS originated. 313 00:16:27,960 --> 00:16:31,188 And that by definition is the leader of i. 314 00:16:31,188 --> 00:16:35,350 And then we do what we always do with depth first search, we immediately look at 315 00:16:35,350 --> 00:16:38,770 the arcs going our of i and we try recursively DFS on any of those. 316 00:16:38,770 --> 00:16:41,380 Although we don't bother to do it if we've already seen those nodes. 317 00:16:42,660 --> 00:16:44,460 Now once this for loop has completed, 318 00:16:44,460 --> 00:16:47,710 once we've examined every outgoing arc from i and for each node j. 319 00:16:47,710 --> 00:16:51,588 Either we already saw it in the past or we've recursively explored from j and 320 00:16:51,588 --> 00:16:52,445 have returned. 321 00:16:52,445 --> 00:16:55,496 At the point, we call ourselves done with node i, 322 00:16:55,496 --> 00:16:58,120 there's no more outgoing arc to explore. 323 00:16:58,120 --> 00:17:02,670 We think of it being finished, remember t is the global variable that's keeping 324 00:17:02,670 --> 00:17:04,790 track of how many nodes we're done with, so 325 00:17:04,790 --> 00:17:07,520 we increment t because now we're done with i. 326 00:17:07,520 --> 00:17:13,000 And we also remember that i was the t-th vertex with which we finished. 327 00:17:14,310 --> 00:17:17,416 That is, we said i's finishing time to be t. 328 00:17:17,416 --> 00:17:21,420 Because depth first search is guaranteed to visit every node exactly once, 329 00:17:21,420 --> 00:17:24,170 and that therefore finish with every node exactly once. 330 00:17:24,170 --> 00:17:27,722 This global counter t, well when the first node is finished it'll be value 1, 331 00:17:27,722 --> 00:17:30,320 then when the next node gets finished I'll have value 2, 332 00:17:30,320 --> 00:17:31,927 then it'll have value 3 and so on. 333 00:17:31,927 --> 00:17:35,010 When the final node gets finished with it'll have value n. 334 00:17:35,010 --> 00:17:38,610 So the finishing times of the nodes are going to be exactly the integers 335 00:17:38,610 --> 00:17:40,640 from 1 to n. 336 00:17:40,640 --> 00:17:44,110 Let's make this much more concrete by going through a careful example. 337 00:17:45,640 --> 00:17:47,690 In fact, I think it'll be better for everybody if you, 338 00:17:47,690 --> 00:17:52,090 yourself, traced through part of this algorithm on a concrete example. 339 00:17:52,090 --> 00:17:53,911 So let me draw a nine node graph for you. 340 00:17:53,911 --> 00:17:58,401 So to be clear, let's assume that we've already executed step one 341 00:17:58,401 --> 00:18:02,455 of the algorithm, and we've already reversed the graph. 342 00:18:02,455 --> 00:18:06,675 So that is, this blue graph that I've drawn on the slide, this is the reversal. 343 00:18:06,675 --> 00:18:08,008 We've already reversed the arcs. 344 00:18:08,008 --> 00:18:11,175 Moreover the nodes are labeled in some arbitrary way from 1 to 9. 345 00:18:11,175 --> 00:18:15,205 Just assume these are how they show up in the node array for example and 346 00:18:15,205 --> 00:18:19,470 remember in the DFS loop routine you're supposed to process the nodes from 347 00:18:20,520 --> 00:18:22,740 top to bottom from n down to 1. 348 00:18:22,740 --> 00:18:23,710 So my question for 349 00:18:23,710 --> 00:18:27,654 you then is in the second step of the algorithm when we run DFS-Loop, and 350 00:18:27,654 --> 00:18:32,000 we process the nodes from the highest name 9 in order down to the lowest name 1. 351 00:18:32,000 --> 00:18:36,770 What are the finishing times that we're going to compute as we run DFS-Loop? 352 00:18:36,770 --> 00:18:40,970 Now, it is true that you can get different finishing times depending on the different 353 00:18:40,970 --> 00:18:45,580 choices that the DFS-Loop has to make about which outgoing arc to explore next. 354 00:18:45,580 --> 00:18:49,310 But I've given you four options for what the finishing times of the nodes 1,2,3, 355 00:18:49,310 --> 00:18:51,600 all the way up to 9, respectively, might be. 356 00:18:51,600 --> 00:18:56,600 And only one of these four could conceivably be an output of the finishing 357 00:18:56,600 --> 00:19:02,150 time of DFS loop on this graph, so which one is it? 358 00:19:02,150 --> 00:19:05,321 All right so the answer is the fourth option, 359 00:19:05,321 --> 00:19:09,827 that is the only one of these four sets of finishing times that you 360 00:19:09,827 --> 00:19:13,927 might see being computed by DFS-Loop on this blue graph. 361 00:19:13,927 --> 00:19:16,171 So let's go ahead and trace through DFS-Loop and 362 00:19:16,171 --> 00:19:19,010 see how we might get this set of finishing times. 363 00:19:19,010 --> 00:19:22,600 So remember in the main loop we start from the highest node 9 and 364 00:19:22,600 --> 00:19:25,830 then we descend down to the lowest node 1. 365 00:19:25,830 --> 00:19:28,995 So we start by invoking DFS from the node 9. 366 00:19:30,180 --> 00:19:33,150 So now from here there's only one outgoing arc, we have to go to so 367 00:19:33,150 --> 00:19:35,100 we mark 9 as explored. 368 00:19:35,100 --> 00:19:37,500 And then there's only one place we can go, we can go to 6. 369 00:19:37,500 --> 00:19:39,030 So we mark 6 as explored. 370 00:19:39,030 --> 00:19:41,090 Now there's two places we can go next, 371 00:19:41,090 --> 00:19:46,330 we can either go to 3 or we can go to 8 and in general DFS could do either one. 372 00:19:46,330 --> 00:19:49,480 Now to generate this fourth set of finishing times 373 00:19:49,480 --> 00:19:52,640 I'm going to need to assume that I go to 3 first okay? 374 00:19:52,640 --> 00:19:56,990 So again, what DFS does, what we're assuming it does, it starts at 9, and 375 00:19:56,990 --> 00:20:01,420 it has to go to 6, it marks those as explored, then it goes to 3. 376 00:20:01,420 --> 00:20:03,790 It does not go to 8 first it goes to 3 first. 377 00:20:03,790 --> 00:20:06,624 Now, from 3, there's only one outgoing arc which goes to 9, but 9, 378 00:20:06,624 --> 00:20:08,470 we've already marked as explored. 379 00:20:08,470 --> 00:20:11,438 So it's not going to re-explore 9, it's going to skip that arc. 380 00:20:11,438 --> 00:20:15,770 Since that's 3's only outgoing arc, then that for loop completes, and 381 00:20:15,770 --> 00:20:18,510 then 3 is the first node to finish. 382 00:20:18,510 --> 00:20:22,670 So when we finish with 3, we increment t, it started at 0, now it's 1, and 383 00:20:22,670 --> 00:20:27,700 we set the finishing time of 3 to be 1. 384 00:20:27,700 --> 00:20:30,104 Just like we said it was in the example. 385 00:20:30,104 --> 00:20:32,980 So, now we backtrack to 6. 386 00:20:32,980 --> 00:20:36,300 Now we have another outgoing arc from 6 to explore, so now we go to 8. 387 00:20:36,300 --> 00:20:39,770 From 8 we have to go to 2, from 2 we have to go to 5, from 5 we have to go 8. 388 00:20:39,770 --> 00:20:42,820 8 we've already seen, so then we're going to be done with 5, 389 00:20:42,820 --> 00:20:44,930 because that was its only outgoing arc. 390 00:20:44,930 --> 00:20:46,720 So then we increment t, now it's 2, and 391 00:20:46,720 --> 00:20:51,500 the finishing time of 5 is going to be 2 as promised. 392 00:20:53,100 --> 00:20:57,120 So now we've backtracked to 2, there's no more outgoing arcs from 2. 393 00:20:57,120 --> 00:21:02,440 So 2 is going to be the third one that we finish as promised. 394 00:21:02,440 --> 00:21:04,975 Then we finish with 8, so the finishing time for 395 00:21:04,975 --> 00:21:09,640 8 is going to be the fourth node to be done as promised. 396 00:21:09,640 --> 00:21:14,440 And now I back track back to 6, now at 6, 397 00:21:14,440 --> 00:21:17,740 that's the fifth node to be completed as promised. 398 00:21:17,740 --> 00:21:23,059 And finally we got all the way back to where we started at 9 and 399 00:21:23,059 --> 00:21:27,350 9 is the sixth node to be completed as promised. 400 00:21:28,480 --> 00:21:32,130 Now if we were computing those leaders all of these nodes would get the leader 9, 401 00:21:32,130 --> 00:21:34,380 but again the leaders are only relevant for the second pass. 402 00:21:34,380 --> 00:21:37,090 So we're just going to ignore the leaders as we're doing this tree so 403 00:21:37,090 --> 00:21:39,450 we're just going to keep track of finishing times. 404 00:21:39,450 --> 00:21:44,310 So now we're not done so all we did is we finished with the DFS that is invoked from 405 00:21:44,310 --> 00:21:50,080 the node 9 and we found 6 of the nodes total in that depth first search. 406 00:21:50,080 --> 00:21:53,501 So now we return to the outer for loop and we decrement i. 407 00:21:53,501 --> 00:21:56,130 So it started at 9, we're done with that, now we go down to 8. 408 00:21:56,130 --> 00:21:59,750 We say, have we already seen 8, yes 8's already explored so we skip it. 409 00:21:59,750 --> 00:22:03,020 We go, we decrement i down to 7, we say have we already seen node 7? 410 00:22:03,020 --> 00:22:04,845 No we have not okay? 411 00:22:04,845 --> 00:22:06,600 7 is not yet explored. 412 00:22:06,600 --> 00:22:11,770 So we invoke DFS now from node sever 7 has two outgoing arcs, 413 00:22:11,770 --> 00:22:13,950 it can either go to 4 or it can go to 9. 414 00:22:13,950 --> 00:22:16,820 Let's say it checks the outgoing arc to 9 first. 415 00:22:16,820 --> 00:22:18,260 Now 9 we already explored. 416 00:22:18,260 --> 00:22:20,940 Granted, that was an earlier part of the for loop, but we remember that. 417 00:22:20,940 --> 00:22:24,673 We're going to keep track of who got explored on previous iterations of the for 418 00:22:24,673 --> 00:22:27,656 loop so we don't bother to re-explore 9, so we skip that. 419 00:22:27,656 --> 00:22:30,698 So now from 7 we have to go to 4, from 4 we have to go to 1, 420 00:22:30,698 --> 00:22:32,350 from 1 we have to go back to 7. 421 00:22:32,350 --> 00:22:36,530 7's already been exploratory backtrack and now we're done with 1. 422 00:22:36,530 --> 00:22:38,300 So 1 is the next one we're completed with and 423 00:22:38,300 --> 00:22:43,240 the finishing count of 1 is going to be 7 as promised. 424 00:22:43,240 --> 00:22:46,320 We backtrack to 4, there's no more outgoing arcs from 4 to explore, 425 00:22:46,320 --> 00:22:48,629 so that's going to be the eighth one to finish. 426 00:22:49,850 --> 00:22:54,491 As promised, and the last one to finish is poor node 7. 427 00:22:54,491 --> 00:22:58,253 It is last. 428 00:22:58,253 --> 00:23:02,686 So that would be an example of how the DFS-Loop subroutine computes finishing 429 00:23:02,686 --> 00:23:04,750 times on a reversed graph. 430 00:23:04,750 --> 00:23:08,860 So now, let's work through the second pass on the forward version of the graph using 431 00:23:08,860 --> 00:23:10,300 the same example. 432 00:23:10,300 --> 00:23:14,040 Now remember, the point of the first pass is to compute a magical ordering, and 433 00:23:14,040 --> 00:23:16,720 the magical ordering is these finishing times. 434 00:23:16,720 --> 00:23:18,800 So now we're going to throw out the original node names, 435 00:23:18,800 --> 00:23:23,530 and we're going to replace the node names in blue by the finishing times in red. 436 00:23:23,530 --> 00:23:25,200 We're also going to work with the original graphs, 437 00:23:25,200 --> 00:23:28,170 which means we have to reverse the arcs back to where they were originally. 438 00:23:28,170 --> 00:23:31,560 So those are the two changes you're going to see when I redraw this graph. 439 00:23:31,560 --> 00:23:34,420 First of all, all the arcs were reverse orientation. 440 00:23:34,420 --> 00:23:38,120 Second of all, all of nodes will change names from their original ones 441 00:23:38,120 --> 00:23:40,710 to the finishing times that we just computed. 442 00:23:43,360 --> 00:23:45,869 So here's our new graph with the new node names and 443 00:23:45,869 --> 00:23:48,390 all of the arcs with their orientation reversed. 444 00:23:48,390 --> 00:23:50,568 And now we run DFS again on this graph. 445 00:23:50,568 --> 00:23:55,064 And again we're going to process the nodes in order from the highest label 9 down to 446 00:23:55,064 --> 00:23:56,235 the lowest label 1. 447 00:23:56,235 --> 00:23:59,426 Moreover, we don't need to compute finishing times in the second pass, 448 00:23:59,426 --> 00:24:01,180 we only need to do that in the first pass. 449 00:24:01,180 --> 00:24:05,837 In the second pass we have to keep track of the leaders, and remember the leader of 450 00:24:05,837 --> 00:24:10,510 a vertex Is the vertex from which DFS was called that first discovered that node. 451 00:24:10,510 --> 00:24:12,333 All right, so what's going to happen? 452 00:24:12,333 --> 00:24:16,803 Well, in the outer for loop, again, we start with i equal to nine, and 453 00:24:16,803 --> 00:24:18,753 we invoke DFS from the node 9. 454 00:24:22,492 --> 00:24:25,289 So, that's going to be the current leader because that's where 455 00:24:25,289 --> 00:24:27,450 the current DFS got initiated. 456 00:24:27,450 --> 00:24:28,790 Now, from 9, there's only one choice. 457 00:24:28,790 --> 00:24:29,590 We have to go to 7. 458 00:24:29,590 --> 00:24:31,950 From 7, there's only one choice, we have to go to 8. 459 00:24:31,950 --> 00:24:34,180 From 8, there's only one choice, we have to go back to 9. 460 00:24:34,180 --> 00:24:37,590 And then, 9's already been seen, so we backtrack. 461 00:24:37,590 --> 00:24:40,525 We go back to 8, we go back to 7, we go back to 9, and that's it. 462 00:24:40,525 --> 00:24:43,482 So, when we invoke DFS for node 9, 463 00:24:43,482 --> 00:24:49,012 the only things that we encounter are, the nodes 7, 8, and 9. 464 00:24:49,012 --> 00:24:53,550 And these are all going to be given the leader vertex 9. 465 00:24:53,550 --> 00:24:56,710 You will notice that this is indeed one 466 00:24:56,710 --> 00:24:59,820 of the strongly connected components of the graph. 467 00:24:59,820 --> 00:25:04,320 We just sort of found it with this indication of DFS from the node 9. 468 00:25:04,320 --> 00:25:07,170 So, now we go back to the outer for loop. 469 00:25:07,170 --> 00:25:09,626 And we say, okay, let's go to node 8, have we already seen 8? 470 00:25:09,626 --> 00:25:10,287 Yes. 471 00:25:10,287 --> 00:25:11,867 What about 7, have we already seen 7? 472 00:25:11,867 --> 00:25:12,865 Yes. 473 00:25:12,865 --> 00:25:13,505 What about 6? 474 00:25:13,505 --> 00:25:14,970 Have we have already seen 6? 475 00:25:14,970 --> 00:25:18,684 We have not, we have not yet discovered 6, so 476 00:25:18,684 --> 00:25:24,230 we invoke DFS from node 6, we reset the global source vortex s to 6. 477 00:25:24,230 --> 00:25:26,980 From 6, we can go to 9, we can go to 1. 478 00:25:26,980 --> 00:25:28,708 So, let's say we explore 9 first. 479 00:25:28,708 --> 00:25:32,080 Well, we already saw 9 in an earlier iteration in the for loop, so 480 00:25:32,080 --> 00:25:36,336 we don't explore it again, so, we don't discover 9 now, so we backtrack to 6. 481 00:25:36,336 --> 00:25:39,090 We go to 1, from 1, we have to go to 5, and 5 we have to go to 6, 482 00:25:39,090 --> 00:25:41,470 and then we start backtracking again. 483 00:25:41,470 --> 00:25:45,157 So, the only new nodes that we encounter when we invoke DFS 484 00:25:45,157 --> 00:25:48,330 from the node 6 are the vertices 6, 1, and 5. 485 00:25:48,330 --> 00:25:53,500 And all of these will have a leader vertex of 486 00:25:53,500 --> 00:25:58,690 6 because that's where we called DFS from when we first discovered these 3 nodes. 487 00:25:58,690 --> 00:26:05,140 And you'll notice, this is another FCC of this directed graph. 488 00:26:05,140 --> 00:26:07,900 So, we invoke DFS again, not from a new node, the new node 6. 489 00:26:07,900 --> 00:26:08,987 And what it discovered, 490 00:26:08,987 --> 00:26:11,591 the new nodes it discovered was exactly an FCC of the graph. 491 00:26:11,591 --> 00:26:14,178 Nothing more, nothing less. 492 00:26:14,178 --> 00:26:17,853 So, now we return to the outer for loop, we go to node 5, have we already seen 5? 493 00:26:17,853 --> 00:26:18,631 Yes. 494 00:26:18,631 --> 00:26:19,897 Have we already seen 4? 495 00:26:19,897 --> 00:26:22,050 No, we haven't seen 4 yet. 496 00:26:22,050 --> 00:26:24,044 So, now we invoke DFS from 4. 497 00:26:24,044 --> 00:26:26,421 Again, we could try to explore 5, but we've seen that before, 498 00:26:26,421 --> 00:26:28,080 we're not going to explore it again. 499 00:26:28,080 --> 00:26:30,710 So from 4 then, we have to go to 2, from 2 we have to go to 3, 500 00:26:30,710 --> 00:26:34,150 from 3 have to go back to 4, and then, after all the backtracking, we're done. 501 00:26:34,150 --> 00:26:37,916 So, the final call to DFS will be from the node 4. 502 00:26:40,157 --> 00:26:43,691 And, that DFS will discover precisely, newly discover, 503 00:26:43,691 --> 00:26:45,817 precisely the nodes 2, 3 and 4. 504 00:26:45,817 --> 00:26:50,662 They will all have the leader vertex 4 because that was where this DFS was 505 00:26:50,662 --> 00:26:51,688 called from. 506 00:26:51,688 --> 00:26:54,620 It's true we'll go back to the for loop and we'll check have we seen 3 yet? 507 00:26:54,620 --> 00:26:55,653 Yes. Have we seen 2 yet? 508 00:26:55,653 --> 00:26:56,176 Yes. 509 00:26:56,176 --> 00:26:57,013 Have we seen 1 yet? 510 00:26:57,013 --> 00:26:59,670 Yes, and then the whole thing completes. 511 00:26:59,670 --> 00:27:01,258 And, what we see is that, 512 00:27:01,258 --> 00:27:05,961 using the finishing times computed from that first depth first search pass, 513 00:27:05,961 --> 00:27:10,519 somehow the strongly connected components of this graph just showed up and 514 00:27:10,519 --> 00:27:14,454 presented themselves to us and one at a time on a silver platter. 515 00:27:14,454 --> 00:27:18,739 Every time we invoke DFS, the nodes we discovered newly were 516 00:27:18,739 --> 00:27:22,870 precisely one of the FCCs, nothing more, nothing less. 517 00:27:22,870 --> 00:27:25,030 And, that's really what's going on in this algorithm, 518 00:27:25,030 --> 00:27:26,760 turns out this was true in general. 519 00:27:26,760 --> 00:27:30,740 The first pass, DFS on the reverse graph, computes finishing times so 520 00:27:30,740 --> 00:27:35,570 that if you then process nodes according to decreased order in finishing times. 521 00:27:35,570 --> 00:27:37,260 In the second pass, 522 00:27:37,260 --> 00:27:42,430 each invocation to DFS will discover one new FCC and exactly one FCC. 523 00:27:42,430 --> 00:27:44,820 So, they'll just present themselves to you, 524 00:27:44,820 --> 00:27:48,601 one per DFS call in that second pass' for loop. 525 00:27:49,700 --> 00:27:52,180 This is, of course, merely an example. 526 00:27:52,180 --> 00:27:56,330 You should not just take a single example as proof that this algorithm always works. 527 00:27:56,330 --> 00:27:58,730 I will give you a general argument in the next video. 528 00:27:58,730 --> 00:28:01,800 But hopefully there is at least a plausibility arcing. 529 00:28:01,800 --> 00:28:05,160 No longer does this three-step algorithm seem totally insane. 530 00:28:05,160 --> 00:28:08,300 And maybe you could imagine, perhaps it works. 531 00:28:08,300 --> 00:28:10,760 At least there's some principles going on. 532 00:28:10,760 --> 00:28:13,890 Where you first compute the right ordering to process the nodes, and 533 00:28:13,890 --> 00:28:19,950 then the second pass peels off FCCs one at a time like layers from an onion. 534 00:28:19,950 --> 00:28:23,040 One thing that I hope is pretty clear is that this algorithm, correct or 535 00:28:23,040 --> 00:28:24,640 not, is blazingly fast. 536 00:28:26,420 --> 00:28:29,708 Pretty much all you do is two depth per searches. 537 00:28:29,708 --> 00:28:34,003 And, since depth per search as we've seen in the past, runs in time linear in 538 00:28:34,003 --> 00:28:37,721 the size of the graph, so does Kosaraju's two-pass algorithm. 539 00:28:37,721 --> 00:28:41,298 There are a couple subtleties and I encourage you to think about this and 540 00:28:41,298 --> 00:28:45,247 you'll be forced to think about this in the program and project for week four. 541 00:28:45,247 --> 00:28:47,342 So, for example, in the second pass, 542 00:28:47,342 --> 00:28:51,260 how do you process the nodes in decreasing order of finishing time? 543 00:28:51,260 --> 00:28:53,580 You don't want to sort the nodes by their finishing time, 544 00:28:53,580 --> 00:28:55,170 because that would take n log and time. 545 00:28:55,170 --> 00:28:58,230 So, you need to make sure that you remember in the first pass, 546 00:28:58,230 --> 00:29:00,930 that you sort of remember the nodes in a way that you can just do 547 00:29:00,930 --> 00:29:02,480 a linear scan through them in a second pass. 548 00:29:02,480 --> 00:29:05,740 So, there are some details, but, if your intuition is that this is really just 549 00:29:05,740 --> 00:29:08,899 double DFS properly implemented, that's pretty much exactly right. 550 00:29:10,020 --> 00:29:12,180 So, having spelled out the full implementation, 551 00:29:12,180 --> 00:29:14,340 argued that it's definitely a linear time algorithm and 552 00:29:14,340 --> 00:29:17,360 given at least a plausibility argument via an example that it 553 00:29:17,360 --> 00:29:21,020 might conceivably be correct, let's now turn to the general argument.