1 00:00:01,732 --> 00:00:04,353 And let's begin with the idea of shortest paths. So, again I'll give you 2 00:00:04,353 --> 00:00:07,570 the movie graph. I'll give you Kevin Bacon as a starting point. What's the fewest 3 00:00:07,570 --> 00:00:13,702 number of hops, the fewest number of edges on a path that leads to, say, Jon Hamm? So 4 00:00:13,702 --> 00:00:20,618 some notation, I'm going to use DIST of V, to denote this shortest path distance. So 5 00:00:20,618 --> 00:00:23,948 with respect to a starting node S, the fewest number of hops or the fewest number 6 00:00:23,948 --> 00:00:28,066 of edges on a path that starts at S, and goes to V. And again you can define this 7 00:00:28,066 --> 00:00:31,249 in the same way for undirected graphs or directed graphs. In a directed graph, you 8 00:00:31,249 --> 00:00:34,309 always want to traverse arcs in the forward direction, in the correct 9 00:00:34,309 --> 00:00:39,563 direction. And to do this we just have to add a very small amount of extra code to 10 00:00:39,563 --> 00:00:43,197 the BFS code that I showed you earlier. It's just gonna be a very small constant 11 00:00:43,197 --> 00:00:46,853 overhead, and basically it just keeps track of what layer each node belongs to, 12 00:00:46,853 --> 00:00:51,004 and the layers are exactly tracking shortest path distances away from the 13 00:00:51,004 --> 00:00:55,696 starting point S. So what's the extra code. Well first in the initialization 14 00:00:55,696 --> 00:01:01,096 step, you set your preliminary estimate of the distance, the number of the shortest 15 00:01:01,096 --> 00:01:07,193 path distance from S to vertex V as well if V equals S, you know you can get from S 16 00:01:07,193 --> 00:01:12,200 to S on a path of length zero, the empty path. And if it's any other vertex all 17 00:01:12,200 --> 00:01:15,406 bets are off, you have no idea if there's a path to V at all. So let's just initially 18 00:01:15,406 --> 00:01:20,390 put plus infinity for all vertices other than the starting point. This is something 19 00:01:20,390 --> 00:01:26,128 we will of course revise once we actually discover a path to vertex V. And the only 20 00:01:26,128 --> 00:01:29,885 other extra code you have to add is, when you're considering, so when you 21 00:01:29,885 --> 00:01:33,744 take a vertex off of the front of the queue and then you iterate through its 22 00:01:33,744 --> 00:01:38,176 edges and you're considering one of those edges V, W, so your V would be the vertex 23 00:01:38,176 --> 00:01:42,203 that you just removed from the front of the queue. And as usual if the other end 24 00:01:42,203 --> 00:01:46,434 of the edge W has already been dealt with then, you know, you just throw it out. 25 00:01:46,434 --> 00:01:49,583 That would be redundant work to look at it again. But if this is the first time 26 00:01:49,583 --> 00:01:53,757 you've seen the vertex W. Then, in addition to what we did previously, in 27 00:01:53,757 --> 00:01:57,604 addition to marking it as explored and putting it in the queue at the back, we 28 00:01:57,604 --> 00:02:02,048 also compute its distance, and its distance is just going to be one more than 29 00:02:02,048 --> 00:02:07,725 the distance of the vertex V, responsible for W's addition to the queue, responsible 30 00:02:07,725 --> 00:02:12,498 for first discovering this vertex W. So, returning to our running example of 31 00:02:12,498 --> 00:02:15,806 breadth first search, let's see what happens. So, again, remember the way this 32 00:02:15,806 --> 00:02:20,165 worked is we start out with from the vertex S, and we set the distance, you 33 00:02:20,165 --> 00:02:23,664 know in our initialization equal to zero. We don't know what the distance is of 34 00:02:23,664 --> 00:02:27,714 anything else. So, then how did breadth first search work? So, we, in the initial 35 00:02:27,714 --> 00:02:32,382 step we put S in the queue. We go to the main while loop, and then the queue's not 36 00:02:32,382 --> 00:02:35,746 empty. We extract S from the queue. We look at its neighbors. Those neighbors are 37 00:02:35,746 --> 00:02:39,110 A and B. We handle them in some order. Let's again think of that we first handle 38 00:02:39,110 --> 00:02:44,173 the edge between S and A. So, then what do we do? We say we haven't seen A yet. So we 39 00:02:44,173 --> 00:02:49,876 mark A as explored. We put A in the queue at the front, and now we have this extra 40 00:02:49,876 --> 00:02:54,736 step. It's the first time we're seeing A, so we wanna compute its distance. And we 41 00:02:54,736 --> 00:02:59,899 compute its distance as one more than the vertex responsible for discovering A. And 42 00:02:59,899 --> 00:03:05,603 so in this case S was the vertex whose exploration unveiled the existence of the 43 00:03:05,603 --> 00:03:12,004 vertex A to us. S's distance is zero so we set A's distance to one. And that's tantamount 44 00:03:12,004 --> 00:03:15,998 to being a member of the ith layer. So what happens in the next iteration of the 45 00:03:15,998 --> 00:03:20,555 while loop. So now the queue contains Sorry, the next iteration of the for 46 00:03:20,555 --> 00:03:24,638 loop, excuse me. So after we've handled the edge S comma A, we're still dealing 47 00:03:24,638 --> 00:03:29,375 with S's edges, now we handle the edge S comma B. We put, it is the first time 48 00:03:29,375 --> 00:03:34,133 we've seen B. We put B at the end of the queue, we mark it as explored, and then we 49 00:03:34,133 --> 00:03:38,982 also execute this new step. We set B's distance to one more than the vertex 50 00:03:38,982 --> 00:03:43,493 responsible for discovering it. That would again be the vertex S. S led to B's 51 00:03:43,493 --> 00:03:47,971 discovery. And so we set B's distance to be one more than S's distance, also known 52 00:03:47,971 --> 00:03:57,151 as one. And that corresponds to being the other node in layer one. Now having 53 00:03:57,151 --> 00:04:02,269 handled all of S's adjacent arcs we go back to the while loop. We ask if the queue is 54 00:04:02,269 --> 00:04:06,499 empty. Certainly not. It takes two vertices, first A then B. We extract the 55 00:04:06,499 --> 00:04:10,943 first vertex cuz it's FIFO, that would be the vertex A. Now we look at A's incident 56 00:04:10,943 --> 00:04:16,084 edges. There's S comma A, which we ignore. There's A comma C. This is the first time 57 00:04:16,084 --> 00:04:20,967 we've seen C. So as before we mark C as explored. We add C to the end of the queue 58 00:04:20,967 --> 00:04:25,286 and now again we have this additional line. We set C's distance to be one more 59 00:04:25,286 --> 00:04:30,214 than the vertex responsible for its discovery. In this case it's A. That first 60 00:04:30,214 --> 00:04:35,051 discovered C. So we're gonna set C's distance to be one more than A's distance also known 61 00:04:35,051 --> 00:04:41,870 as two. So then having handled A we move on to the next vertex in the 62 00:04:41,870 --> 00:04:47,157 queue, which in this case is B. Again we can forget about the edge between S and V. 63 00:04:47,157 --> 00:04:50,734 We've already seen S, we can forget about the edge between B and C. We've already 64 00:04:50,734 --> 00:04:56,392 seen C but D is now discovered for the first time via B. It gets more as explored, 65 00:04:56,392 --> 00:04:59,756 it goes to the end of the queue and its distance is set equal to one more than B's 66 00:04:59,756 --> 00:05:06,087 distances which is two. So, then we deal with C. Again it has four arcs, four 67 00:05:06,087 --> 00:05:09,764 edges, three of them are irrelevant. The one to E is not irrelevant, cause this is 68 00:05:09,764 --> 00:05:13,824 the first time we've seen E. So, E's distance is computed as one more than C, 69 00:05:13,824 --> 00:05:19,135 cause C was the one who first found E, and so E gets a distance of three, and then 70 00:05:19,135 --> 00:05:22,273 the rest of the algorithm proceeds as before. And you will notice that the 71 00:05:22,273 --> 00:05:27,470 labelings, the shortest path labels, are exactly the layers as promised. I hope you 72 00:05:27,470 --> 00:05:31,791 find it very easy to believe at this point that, that claim is true in general. That 73 00:05:31,791 --> 00:05:35,695 the distance computed by breadth-first search for an arbitrary vertex V, that's 74 00:05:35,695 --> 00:05:41,680 reachable from S is, that's gonna be equal to i if and only if V is in the ith 75 00:05:41,680 --> 00:05:46,776 layer as we've been defining it previously. And what does it really mean 76 00:05:46,776 --> 00:05:51,343 to be in the ith layer? It means that the shortest path distance between V 77 00:05:51,343 --> 00:05:57,204 and S has i hops, i edges. So I don't wanna spend time giving a super rigorous 78 00:05:57,204 --> 00:06:01,637 proof of this claim but let me just give you the gist, the basic idea, 79 00:06:01,637 --> 00:06:04,910 and I encourage you to produce some formal proof at home if that is something that 80 00:06:04,910 --> 00:06:10,130 interests you. So one way to do it is you can do it by induction on the layer i. And so what you 81 00:06:10,130 --> 00:06:14,293 want to prove is that all of the nodes that belong to a given layer i do Indeed, breadth 82 00:06:14,293 --> 00:06:19,412 first search does indeed compute the distance of i for them. So what does it 83 00:06:19,412 --> 00:06:23,327 mean to be a node in layer i? Well, first of all, you can't have been seen in either 84 00:06:23,327 --> 00:06:26,577 of the, any of the previous layers; you weren't a member of layer zero through i 85 00:06:26,577 --> 00:06:30,526 minus one. And furthermore, you're a, a neighbor of somebody who's in layer i 86 00:06:30,526 --> 00:06:34,137 minus one. Right? You're seen for the first time once all of the layer i minus 87 00:06:34,137 --> 00:06:37,884 one nodes are processed. So the inductive hypothesis tells you that distances were 88 00:06:37,884 --> 00:06:41,079 correctly computed for everybody from the lower l, from the lower layers. So in 89 00:06:41,079 --> 00:06:45,477 particular, whoever this node V was from layer i minus one was responsible for 90 00:06:45,477 --> 00:06:50,922 discovering u, in layer i. It has a distance computed as i minus one. Yours is assigned 91 00:06:50,922 --> 00:06:55,389 to be one more than its, namely i. So that pushes through the inductive step 92 00:06:55,389 --> 00:06:59,844 everything in layer i indeed gets the correct label of a shortest path 93 00:06:59,844 --> 00:07:05,885 distance i away from S. So before we wrap up with this application, I do wanna 94 00:07:05,885 --> 00:07:10,620 emphasize, it is only breadth first search that gives us this guarantee of shortest 95 00:07:10,620 --> 00:07:14,753 paths. So, we have a wide family of graph search strategies, all of which find 96 00:07:14,753 --> 00:07:18,848 everything findable. Breadth-first search is one of those, but this is a special 97 00:07:18,848 --> 00:07:23,593 additional property that breadth-first search has: you get shortest path distances from 98 00:07:23,593 --> 00:07:28,055 it. So in particular depth-first search does not in general compute shortest path distances. 99 00:07:28,055 --> 00:07:32,713 This is really a special property of breadth-first search. By contrast in this next 100 00:07:32,713 --> 00:07:36,447 application, which is going to be computing the connected components of 101 00:07:36,447 --> 00:07:40,272 an undirected graph, this is not really fundamental to breadth first search. For example, 102 00:07:40,272 --> 00:07:43,782 you could use depth first search instead and that would work just as well.