1
00:00:01,732 --> 00:00:04,353
And let's begin with the idea of
shortest paths. So, again I'll give you

2
00:00:04,353 --> 00:00:07,570
the movie graph. I'll give you Kevin Bacon
as a starting point. What's the fewest

3
00:00:07,570 --> 00:00:13,702
number of hops, the fewest number of edges
on a path that leads to, say, Jon Hamm? So

4
00:00:13,702 --> 00:00:20,618
some notation, I'm going to use DIST of V,
to denote this shortest path distance. So

5
00:00:20,618 --> 00:00:23,948
with respect to a starting node S, the
fewest number of hops or the fewest number

6
00:00:23,948 --> 00:00:28,066
of edges on a path that starts at S, and
goes to V. And again you can define this

7
00:00:28,066 --> 00:00:31,249
in the same way for undirected graphs or
directed graphs. In a directed graph, you

8
00:00:31,249 --> 00:00:34,309
always want to traverse arcs in the
forward direction, in the correct

9
00:00:34,309 --> 00:00:39,563
direction. And to do this we just have to
add a very small amount of extra code to

10
00:00:39,563 --> 00:00:43,197
the BFS code that I showed you earlier.
It's just gonna be a very small constant

11
00:00:43,197 --> 00:00:46,853
overhead, and basically it just keeps
track of what layer each node belongs to,

12
00:00:46,853 --> 00:00:51,004
and the layers are exactly tracking
shortest path distances away from the

13
00:00:51,004 --> 00:00:55,696
starting point S. So what's the
extra code. Well first in the initialization

14
00:00:55,696 --> 00:01:01,096
step, you set your preliminary estimate of
the distance, the number of the shortest

15
00:01:01,096 --> 00:01:07,193
path distance from S to vertex V as well
if V equals S, you know you can get from S

16
00:01:07,193 --> 00:01:12,200
to S on a path of length zero, the empty
path. And if it's any other vertex all

17
00:01:12,200 --> 00:01:15,406
bets are off, you have no idea if there's a
path to V at all. So let's just initially

18
00:01:15,406 --> 00:01:20,390
put plus infinity for all vertices other
than the starting point. This is something

19
00:01:20,390 --> 00:01:26,128
we will of course revise once we actually
discover a path to vertex V. And the only

20
00:01:26,128 --> 00:01:29,885
other extra code you have to add
is, when you're considering, so when you

21
00:01:29,885 --> 00:01:33,744
take a vertex off of the front of the
queue and then you iterate through its

22
00:01:33,744 --> 00:01:38,176
edges and you're considering one of those
edges V, W, so your V would be the vertex

23
00:01:38,176 --> 00:01:42,203
that you just removed from the front of
the queue. And as usual if the other end

24
00:01:42,203 --> 00:01:46,434
of the edge W has already been dealt with
then, you know, you just throw it out.

25
00:01:46,434 --> 00:01:49,583
That would be redundant work to look at it
again. But if this is the first time

26
00:01:49,583 --> 00:01:53,757
you've seen the vertex W. Then, in
addition to what we did previously, in

27
00:01:53,757 --> 00:01:57,604
addition to marking it as explored and
putting it in the queue at the back, we

28
00:01:57,604 --> 00:02:02,048
also compute its distance, and its
distance is just going to be one more than

29
00:02:02,048 --> 00:02:07,725
the distance of the vertex V, responsible
for W's addition to the queue, responsible

30
00:02:07,725 --> 00:02:12,498
for first discovering this vertex W. So,
returning to our running example of

31
00:02:12,498 --> 00:02:15,806
breadth first search, let's see what
happens. So, again, remember the way this

32
00:02:15,806 --> 00:02:20,165
worked is we start out with from the
vertex S, and we set the distance, you

33
00:02:20,165 --> 00:02:23,664
know in our initialization equal to zero.
We don't know what the distance is of

34
00:02:23,664 --> 00:02:27,714
anything else. So, then how did breadth
first search work? So, we, in the initial

35
00:02:27,714 --> 00:02:32,382
step we put S in the queue. We go to the
main while loop, and then the queue's not

36
00:02:32,382 --> 00:02:35,746
empty. We extract S from the queue. We
look at its neighbors. Those neighbors are

37
00:02:35,746 --> 00:02:39,110
A and B. We handle them in some order.
Let's again think of that we first handle

38
00:02:39,110 --> 00:02:44,173
the edge between S and A. So, then what do
we do? We say we haven't seen A yet. So we

39
00:02:44,173 --> 00:02:49,876
mark A as explored. We put A in the queue at
the front, and now we have this extra

40
00:02:49,876 --> 00:02:54,736
step. It's the first time we're seeing A,
so we wanna compute its distance. And we

41
00:02:54,736 --> 00:02:59,899
compute its distance as one more than the
vertex responsible for discovering A. And

42
00:02:59,899 --> 00:03:05,603
so in this case S was the vertex whose
exploration unveiled the existence of the

43
00:03:05,603 --> 00:03:12,004
vertex A to us. S's distance is zero so we set
A's distance to one. And that's tantamount

44
00:03:12,004 --> 00:03:15,998
to being a member of the ith layer. So
what happens in the next iteration of the

45
00:03:15,998 --> 00:03:20,555
while loop. So now the queue contains
Sorry, the next iteration of the for

46
00:03:20,555 --> 00:03:24,638
loop, excuse me. So after we've handled
the edge S comma A, we're still dealing

47
00:03:24,638 --> 00:03:29,375
with S's edges, now we handle the edge S
comma B. We put, it is the first time

48
00:03:29,375 --> 00:03:34,133
we've seen B. We put B at the end of the
queue, we mark it as explored, and then we

49
00:03:34,133 --> 00:03:38,982
also execute this new step. We set B's
distance to one more than the vertex

50
00:03:38,982 --> 00:03:43,493
responsible for discovering it. That would
again be the vertex S. S led to B's

51
00:03:43,493 --> 00:03:47,971
discovery. And so we set B's distance to
be one more than S's distance, also known

52
00:03:47,971 --> 00:03:57,151
as one. And that corresponds to being the
other node in layer one. Now having

53
00:03:57,151 --> 00:04:02,269
handled all of S's adjacent arcs we go
back to the while loop. We ask if the queue is

54
00:04:02,269 --> 00:04:06,499
empty. Certainly not. It takes two
vertices, first A then B. We extract the

55
00:04:06,499 --> 00:04:10,943
first vertex cuz it's FIFO, that would be
the vertex A. Now we look at A's incident

56
00:04:10,943 --> 00:04:16,084
edges. There's S comma A, which we ignore.
There's A comma C. This is the first time

57
00:04:16,084 --> 00:04:20,967
we've seen C. So as before we mark C as
explored. We add C to the end of the queue

58
00:04:20,967 --> 00:04:25,286
and now again we have this additional
line. We set C's distance to be one more

59
00:04:25,286 --> 00:04:30,214
than the vertex responsible for its
discovery. In this case it's A. That first

60
00:04:30,214 --> 00:04:35,051
discovered C. So we're gonna set C's distance to
be one more than A's distance also known

61
00:04:35,051 --> 00:04:41,870
as two. So then having handled A
we move on to the next vertex in the

62
00:04:41,870 --> 00:04:47,157
queue, which in this case is B. Again we
can forget about the edge between S and V.

63
00:04:47,157 --> 00:04:50,734
We've already seen S, we can forget about
the edge between B and C. We've already

64
00:04:50,734 --> 00:04:56,392
seen C but D is now discovered for the
first time via B. It gets more as explored,

65
00:04:56,392 --> 00:04:59,756
it goes to the end of the queue and its
distance is set equal to one more than B's

66
00:04:59,756 --> 00:05:06,087
distances which is two. So, then we deal
with C. Again it has four arcs, four

67
00:05:06,087 --> 00:05:09,764
edges, three of them are irrelevant. The
one to E is not irrelevant, cause this is

68
00:05:09,764 --> 00:05:13,824
the first time we've seen E. So, E's
distance is computed as one more than C,

69
00:05:13,824 --> 00:05:19,135
cause C was the one who first found E, and
so E gets a distance of three, and then

70
00:05:19,135 --> 00:05:22,273
the rest of the algorithm proceeds as
before. And you will notice that the

71
00:05:22,273 --> 00:05:27,470
labelings, the shortest path labels, are
exactly the layers as promised. I hope you

72
00:05:27,470 --> 00:05:31,791
find it very easy to believe at this point
that, that claim is true in general. That

73
00:05:31,791 --> 00:05:35,695
the distance computed by breadth-first
search for an arbitrary vertex V, that's

74
00:05:35,695 --> 00:05:41,680
reachable from S is, that's gonna be equal
to i if and only if V is in the ith

75
00:05:41,680 --> 00:05:46,776
layer as we've been defining it
previously. And what does it really mean

76
00:05:46,776 --> 00:05:51,343
to be in the ith layer? It means
that the shortest path distance between V

77
00:05:51,343 --> 00:05:57,204
and S has i hops, i edges. So I don't
wanna spend time giving a super rigorous

78
00:05:57,204 --> 00:06:01,637
proof of this claim but let me just give
you the gist, the basic idea,

79
00:06:01,637 --> 00:06:04,910
and I encourage you to produce some formal
proof at home if that is something that

80
00:06:04,910 --> 00:06:10,130
interests you. So one way to do it is you can do it
by induction on the layer i. And so what you

81
00:06:10,130 --> 00:06:14,293
want to prove is that all of the nodes that belong to a given layer i do Indeed, breadth

82
00:06:14,293 --> 00:06:19,412
first search does indeed compute the
distance of i for them. So what does it

83
00:06:19,412 --> 00:06:23,327
mean to be a node in layer i? Well, first
of all, you can't have been seen in either

84
00:06:23,327 --> 00:06:26,577
of the, any of the previous layers; you
weren't a member of layer zero through i

85
00:06:26,577 --> 00:06:30,526
minus one. And furthermore, you're a, a
neighbor of somebody who's in layer i

86
00:06:30,526 --> 00:06:34,137
minus one. Right? You're seen for the
first time once all of the layer i minus

87
00:06:34,137 --> 00:06:37,884
one nodes are processed. So the inductive
hypothesis tells you that distances were

88
00:06:37,884 --> 00:06:41,079
correctly computed for everybody from the
lower l, from the lower layers. So in

89
00:06:41,079 --> 00:06:45,477
particular, whoever this node V was from
layer i minus one was responsible for

90
00:06:45,477 --> 00:06:50,922
discovering u, in layer i. It has a distance
computed as i minus one. Yours is assigned

91
00:06:50,922 --> 00:06:55,389
to be one more than its, namely i. So that
pushes through the inductive step

92
00:06:55,389 --> 00:06:59,844
everything in layer i indeed gets the
correct label of a shortest path

93
00:06:59,844 --> 00:07:05,885
distance i away from S. So before we wrap
up with this application, I do wanna

94
00:07:05,885 --> 00:07:10,620
emphasize, it is only breadth first search
that gives us this guarantee of shortest

95
00:07:10,620 --> 00:07:14,753
paths. So, we have a wide family of graph
search strategies, all of which find

96
00:07:14,753 --> 00:07:18,848
everything findable. Breadth-first search is
one of those, but this is a special

97
00:07:18,848 --> 00:07:23,593
additional property that breadth-first search
has: you get shortest path distances from

98
00:07:23,593 --> 00:07:28,055
it. So in particular depth-first search does not in
general compute shortest path distances.

99
00:07:28,055 --> 00:07:32,713
This is really a special property of
breadth-first search. By contrast in this next

100
00:07:32,713 --> 00:07:36,447
application, which is going to be
computing the connected components of

101
00:07:36,447 --> 00:07:40,272
an undirected graph, this is not really
fundamental to breadth first search. For example,

102
00:07:40,272 --> 00:07:43,782
you could use depth first search instead and
that would work just as well.