1 00:00:03,500 --> 00:00:06,689 Next we are going to talk about breadth-firsth search, 2 00:00:06,689 --> 00:00:11,444 Which is a completely different way to process all the vertices connected to a 3 00:00:11,444 --> 00:00:13,791 given vertex. It'll get the job done, 4 00:00:13,791 --> 00:00:18,485 But it has totally different properties that are useful in different ways for 5 00:00:18,485 --> 00:00:23,631 different applications. So, to understand a breadth-first search, 6 00:00:23,913 --> 00:00:29,706 we'll start right out with a demo. So, a breadth-first search is not a 7 00:00:29,706 --> 00:00:34,900 recursive algorithm. It uses a Q as a auxiliary data structure. 8 00:00:34,900 --> 00:00:40,010 And it's also is quite similar quite simple to explain. 9 00:00:40,010 --> 00:00:45,500 So what we're going to do is we're going to put the source vertex on a queue and 10 00:00:45,500 --> 00:00:49,233 then repeat the following until the queue is empty. 11 00:00:49,233 --> 00:00:55,601 Remove the next vertex from the queue in FIFO order then add to the queue all 12 00:00:55,601 --> 00:01:01,458 unmarked vertices that are adjacent to V and mark them and just keep doing that 13 00:01:01,458 --> 00:01:06,216 until the queue is empty. Let's see how that works on our example. 14 00:01:06,435 --> 00:01:11,560 This is a smaller example, just a six vertex graph with eight edges. 15 00:01:11,560 --> 00:01:16,665 So, add zero to the queue. So we just, take zero and put it on the 16 00:01:16,665 --> 00:01:18,707 queue, That's where we start. 17 00:01:18,925 --> 00:01:23,957 And now go into repeat until queue after remove a vertex, 18 00:01:23,957 --> 00:01:27,312 Add all the marked vertex adjacent to mark'em. 19 00:01:27,312 --> 00:01:32,649 So, we do qeue zero. And then, in order to process zero, we 20 00:01:32,649 --> 00:01:39,253 need to check all its adjacent vertices. So in this case, it's two, one and five. 21 00:01:39,253 --> 00:01:46,778 And again, the order depends on, how the, how the bag was constructed for, vertices 22 00:01:46,778 --> 00:01:52,566 adjacent to zero. So, we check two, And that is, not marked. 23 00:01:52,566 --> 00:01:59,112 So we have to add it to the queue. We check then we check one, That's not 24 00:01:59,112 --> 00:02:05,329 marked, so we add it to the queue. Then we check five, And that's not marked, 25 00:02:05,329 --> 00:02:11,328 so we add it to the queue. So, that's, we finished, processing zero, 26 00:02:11,328 --> 00:02:15,366 zero is done. So now, remove the next vertex from the 27 00:02:15,366 --> 00:02:17,145 queue. In this case it's two. 28 00:02:17,146 --> 00:02:22,633 We're going to take two off the queue, and process this by adding to the queue all 29 00:02:22,633 --> 00:02:29,190 the unmarked vertices that are adjacent. So the process two, we have to check zero, 30 00:02:29,190 --> 00:02:33,791 one, three, and four. We check zero, that's already marked, so 31 00:02:33,791 --> 00:02:39,100 we don't do anything. We check one, that's also already marked, 32 00:02:39,100 --> 00:02:43,120 so we don't do anythin. in fact, it's on the queue. 33 00:02:43,347 --> 00:02:49,718 We check three, and that one is unmarked so we mark it and add it to the queue. 34 00:02:49,946 --> 00:02:55,634 And then we check four, that one's unmarked, so we mark it and add it to the 35 00:02:55,634 --> 00:03:03,105 queue. So by the way I didn't mention but we're 36 00:03:03,105 --> 00:03:09,432 also keeping track of two, auxiliary data structures for this. 37 00:03:09,432 --> 00:03:13,663 One is the edge two array, which is the same as before. 38 00:03:13,886 --> 00:03:18,044 What edge did we use to get to this, so four. 39 00:03:18,044 --> 00:03:22,421 And we got to four from two. And two, we got to two from zero. 40 00:03:22,424 --> 00:03:27,769 So again, that's going to be a tree that gives us a path back to the source. 41 00:03:27,992 --> 00:03:33,931 And instead of marked, we also keep, we keep a more, detailed information, which 42 00:03:33,931 --> 00:03:39,500 is, the length of the path, cuz it's, we do it, 'cause it's easy to do it. 43 00:03:39,726 --> 00:03:44,562 Okay, so four, we checked four, and added it to the queue, 44 00:03:44,788 --> 00:03:50,077 And now we're done with two. So now we have one, five, three, and four 45 00:03:50,077 --> 00:03:54,006 all on the queue, And we're going to process them. 46 00:03:54,233 --> 00:04:00,580 And since we've marked everything all we're going to be doing now is checking 47 00:04:00,580 --> 00:04:05,415 vertices that are marked. So for one, we check zero and that's 48 00:04:05,415 --> 00:04:08,664 marked. Then we check two and that's marked, 49 00:04:08,891 --> 00:04:14,150 So then we're done with one. Next thing off the queue is five, And we 50 00:04:14,150 --> 00:04:19,025 check three and that's marked, and we check zero and that's marked, so we're 51 00:04:19,025 --> 00:04:23,440 done with five. And then three, we got to check five, and 52 00:04:23,440 --> 00:04:29,405 then four, and then two and they're all marked, and now we're done with three. 53 00:04:29,405 --> 00:04:34,737 And then finally the last one, Always the last one everybody else has 54 00:04:34,737 --> 00:04:38,755 marked, So connected, check three, Check two, and 55 00:04:38,755 --> 00:04:44,879 it's marked and we're done. So what this process the result of this 56 00:04:44,879 --> 00:04:51,841 computation again is a three rooted at the source and we can follow back through the 57 00:04:51,841 --> 00:04:56,202 three to get past from each node to the source. 58 00:04:56,454 --> 00:05:03,835 Not only that, we can get the distance the number of edges on the path from each node 59 00:05:03,835 --> 00:05:07,768 to the source. So that's a demo of breadth-first search. 60 00:05:07,956 --> 00:05:11,780 And next we'll take a look at properties, Of this algorithm. 61 00:05:11,780 --> 00:05:16,489 Alright. So now we've considered two different methods for processing all 62 00:05:16,489 --> 00:05:20,230 vertices in a graph. And actually, they're quite closely 63 00:05:20,230 --> 00:05:23,907 related, even though the computations are quite different. 64 00:05:23,907 --> 00:05:28,294 Essentially, breadth-first search uses recursion to corresponds to putting 65 00:05:28,294 --> 00:05:32,294 unvisited vertices on a stack. Breadth-first search explicitly, we put 66 00:05:32,294 --> 00:05:37,764 the invert, visited vertices on the queue. And actually, breadth-first search solves, 67 00:05:37,980 --> 00:05:42,809 another problem that, you know, often we want to solve, called the shortest path 68 00:05:42,809 --> 00:05:46,340 problem. Actually, the path that you get back from 69 00:05:46,340 --> 00:05:52,321 breadth-first search is the path from the source to the given vertex that uses the 70 00:05:52,321 --> 00:05:57,293 fewest number of edges. And we'll look at that, in just a minute. 71 00:05:57,510 --> 00:06:03,347 And the idea is that the breadth-first search examines the vertices in the graph 72 00:06:03,347 --> 00:06:08,680 and increasing distance from the source. So, let's take a look at that. 73 00:06:08,680 --> 00:06:12,653 So breadth-first, breadth-first search computes shortest path, 74 00:06:12,653 --> 00:06:17,817 That is it builds the data structure that we can answer the shortest path queries 75 00:06:17,817 --> 00:06:21,989 from the source with, From S to all other vertices in the graph 76 00:06:21,989 --> 00:06:27,353 and time proportial to E plus V, then [inaudible] vertices. 77 00:06:27,551 --> 00:06:33,246 And so let's look at why that is the case. So the first thing is, how does, how do we 78 00:06:33,246 --> 00:06:40,541 know it competes, computes shortest paths? Well, the intuition is and, you, you can 79 00:06:40,541 --> 00:06:45,720 fill in the details, the queue always contains, 80 00:06:45,972 --> 00:06:52,215 Some vertices of distance K from the source followed by some vertices of 81 00:06:52,215 --> 00:06:56,999 distance K plus one. So they're on a queue we're processing 82 00:06:56,999 --> 00:07:03,758 them in first in first out order. So say we're at a state when all of these 83 00:07:03,758 --> 00:07:09,728 vertices are, are on the queue. we're going to have process vertex zero, as soon 84 00:07:09,728 --> 00:07:14,730 as we get this one we'll delete vertex zero from the queue, and then when we're 85 00:07:14,730 --> 00:07:18,940 putting these adjacent ones on, we're adding the ones of distance two. 86 00:07:18,940 --> 00:07:23,515 But we're not going to process any of those until we're done with the ones of 87 00:07:23,515 --> 00:07:27,603 distance one and so forth. So it's not hard to show that always you 88 00:07:27,603 --> 00:07:32,849 have either one of the two distances from, from the source on the queue, and that 89 00:07:32,849 --> 00:07:37,730 means the first time we get to a vertex, that's the shortest path to that vertex. 90 00:07:37,730 --> 00:07:43,521 And again, the running time we only visit vertices once cuz we mark them. 91 00:07:43,521 --> 00:07:50,277 So it's time proportional to the number of vertices plus the number of edges in the 92 00:07:50,277 --> 00:07:55,181 graph. So that's breadth-first search properties 93 00:07:55,181 --> 00:08:03,841 and then here's the implantation, which is simply code for the basic method that we 94 00:08:03,841 --> 00:08:09,302 outlined in pseudocode. So our private instance variables are 95 00:08:09,302 --> 00:08:16,350 marked or in the demo, we used this to, But just for simplicity, let's use marked 96 00:08:16,350 --> 00:08:22,340 edge two, then, is how we get to the first vertex and then the source. 97 00:08:22,340 --> 00:08:28,859 And so, we have a constructor that builds those arrays same way as before and then 98 00:08:28,859 --> 00:08:31,996 calls BFS. So BFS builds a queue, 99 00:08:32,237 --> 00:08:37,636 That's what it's going to use. Then queues the source and marks the 100 00:08:37,636 --> 00:08:41,257 source. And then, this is just in code what we 101 00:08:41,257 --> 00:08:45,280 said in words before, while the queue is not empty, 102 00:08:45,536 --> 00:08:49,991 We pull off the next vertex from the queue, call it V. 103 00:08:49,991 --> 00:08:56,795 For everybody adjacent to V, We go ahead and check if it's marked, we 104 00:08:56,795 --> 00:09:03,807 ignore it and move to the next. If it's not marked then we put it on the 105 00:09:03,807 --> 00:09:11,359 queue mark it, and remember the edge. And again this is an example of the power 106 00:09:11,359 --> 00:09:16,933 of extraction and the utility of our design pattering, pattern. 107 00:09:17,203 --> 00:09:23,807 All we're doing in terms of the graph data type is being a client to go through all 108 00:09:23,807 --> 00:09:27,711 the adjacent vertices. But it allows us to implement this 109 00:09:27,711 --> 00:09:32,135 completely different algorithm in, in, in, really an accessible way. 110 00:09:32,135 --> 00:09:35,648 So that's the implementation of breadth-first search. 111 00:09:35,648 --> 00:09:40,983 Then the client for getting the pass back as be saying this as for, as for 112 00:09:40,983 --> 00:09:44,892 breadth-first search. So here's an old application of 113 00:09:44,892 --> 00:09:48,549 breadth-first search in, in computer networks. 114 00:09:48,549 --> 00:09:53,964 That's very important when you're communicating from one place to another, 115 00:09:53,964 --> 00:09:57,410 you want to get there in the fewest number of hops. 116 00:09:57,617 --> 00:10:02,331 This is the Arpanet. The, the predecessor of the, internet, as 117 00:10:02,331 --> 00:10:06,005 of July 1977. And things were slow and computers were, 118 00:10:06,005 --> 00:10:11,065 computers were small and slow. It's important to do these things in small 119 00:10:11,065 --> 00:10:15,086 number of hops. And with breadth-first search, you could 120 00:10:15,086 --> 00:10:20,562 take this graph and figure out, the shortest way to get from, one place to 121 00:10:20,562 --> 00:10:23,681 another. That's a typical application of 122 00:10:23,681 --> 00:10:28,193 breadth-first search. And here's another one, 123 00:10:28,421 --> 00:10:34,265 So called Kevin Bacon numbers. And nowadays actually you can type Bacon 124 00:10:34,492 --> 00:10:38,287 and an actor's name, And get the answer to this. 125 00:10:38,515 --> 00:10:45,573 So there's if, if you're not, not familiar with it you can become familiar with it by 126 00:10:45,573 --> 00:10:50,643 Kevin Bacon numbers. The idea is you have a graph, where, the 127 00:10:50,643 --> 00:10:57,962 vertices are actors, and the edge, and think of an edge connecting two actors, if 128 00:10:57,962 --> 00:11:04,719 they are in a movie together. and so what you wabt to find is given an actor how, 129 00:11:04,960 --> 00:11:12,360 what's the shortest way to get to Kevin Bacon, connected by, so, edges were actors 130 00:11:12,360 --> 00:11:17,548 and edges were movies, In connection of actors in the movie. 131 00:11:17,548 --> 00:11:24,386 So Buzz Mauro and Tatiana Ramirez were in Sweet Dreams together, and these two 132 00:11:24,386 --> 00:11:28,705 actors were in this movie together and so forth. 133 00:11:28,705 --> 00:11:33,024 And you get a way to Kevin Bacon from any actor. 134 00:11:33,024 --> 00:11:38,962 And this is another pop culture application, this is so called six 135 00:11:38,962 --> 00:11:44,450 degrees, which you can get to anyone in six steps in this way. 136 00:11:44,450 --> 00:11:50,020 So that's all implementation of. Breadth first search on the Kevin Bacon 137 00:11:50,020 --> 00:11:52,913 graph, Where we include one vertex for each 138 00:11:52,913 --> 00:11:55,335 performer, One vertex for each movie. 139 00:11:55,335 --> 00:12:00,918 Connect the movie to all performers that appear in the aovie And the shortest path 140 00:12:00,918 --> 00:12:06,300 from Kevin Bacon to every actor, if you follow through, back through that path, 141 00:12:06,501 --> 00:12:11,708 you get. To. You, you get the proof of the Kevin 142 00:12:11,708 --> 00:12:18,630 Bacon number for each actor and we have implementation of that on the book side. 143 00:12:18,912 --> 00:12:25,229 So that's another example. And actually there's maybe even older or 144 00:12:25,229 --> 00:12:32,300 similar age example that mathematicians are fond of and that is called the, 145 00:12:32,300 --> 00:12:37,903 So called Erdos number. So in this one it's mathematicians, the 146 00:12:37,903 --> 00:12:42,371 nodes are mathematicians, And there's an edge if the two 147 00:12:42,371 --> 00:12:49,489 mathematicians have co-authored a paper. And Paul Erdos was a, a very productive 148 00:12:49,489 --> 00:12:56,228 Hungarian mathematician who traveled the world co authoring papers with people all 149 00:12:56,228 --> 00:13:00,393 over the world. A, a very interesting and prolific 150 00:13:00,393 --> 00:13:06,300 character, who actually did quite a bit of research on properties of graphs. 151 00:13:06,517 --> 00:13:13,415 And maybe even more so than Kevin Bacon. The idea is he was so prolific that pretty 152 00:13:13,415 --> 00:13:18,280 much every mathematician has a pretty low, Erdos number. 153 00:13:18,280 --> 00:13:25,290 So that's our second example of a graph-processing algorithm breadth-first