1
00:00:03,500 --> 00:00:06,689
Next we are going to talk about
breadth-firsth search,

2
00:00:06,689 --> 00:00:11,444
Which is a completely different way to
process all the vertices connected to a

3
00:00:11,444 --> 00:00:13,791
given vertex.
It'll get the job done,

4
00:00:13,791 --> 00:00:18,485
But it has totally different properties
that are useful in different ways for

5
00:00:18,485 --> 00:00:23,631
different applications.
So, to understand a breadth-first search,

6
00:00:23,913 --> 00:00:29,706
we'll start right out with a demo.
So, a breadth-first search is not a

7
00:00:29,706 --> 00:00:34,900
recursive algorithm.
It uses a Q as a auxiliary data structure.

8
00:00:34,900 --> 00:00:40,010
And it's also is quite similar quite
simple to explain.

9
00:00:40,010 --> 00:00:45,500
So what we're going to do is we're going
to put the source vertex on a queue and

10
00:00:45,500 --> 00:00:49,233
then repeat the following until the queue
is empty.

11
00:00:49,233 --> 00:00:55,601
Remove the next vertex from the queue in
FIFO order then add to the queue all

12
00:00:55,601 --> 00:01:01,458
unmarked vertices that are adjacent to V
and mark them and just keep doing that

13
00:01:01,458 --> 00:01:06,216
until the queue is empty.
Let's see how that works on our example.

14
00:01:06,435 --> 00:01:11,560
This is a smaller example, just a six
vertex graph with eight edges.

15
00:01:11,560 --> 00:01:16,665
So, add zero to the queue.
So we just, take zero and put it on the

16
00:01:16,665 --> 00:01:18,707
queue,
That's where we start.

17
00:01:18,925 --> 00:01:23,957
And now go into repeat until queue after
remove a vertex,

18
00:01:23,957 --> 00:01:27,312
Add all the marked vertex adjacent to
mark'em.

19
00:01:27,312 --> 00:01:32,649
So, we do qeue zero.
And then, in order to process zero, we

20
00:01:32,649 --> 00:01:39,253
need to check all its adjacent vertices.
So in this case, it's two, one and five.

21
00:01:39,253 --> 00:01:46,778
And again, the order depends on, how the,
how the bag was constructed for, vertices

22
00:01:46,778 --> 00:01:52,566
adjacent to zero.
So, we check two, And that is, not marked.

23
00:01:52,566 --> 00:01:59,112
So we have to add it to the queue.
We check then we check one, That's not

24
00:01:59,112 --> 00:02:05,329
marked, so we add it to the queue.
Then we check five, And that's not marked,

25
00:02:05,329 --> 00:02:11,328
so we add it to the queue.
So, that's, we finished, processing zero,

26
00:02:11,328 --> 00:02:15,366
zero is done.
So now, remove the next vertex from the

27
00:02:15,366 --> 00:02:17,145
queue.
In this case it's two.

28
00:02:17,146 --> 00:02:22,633
We're going to take two off the queue, and
process this by adding to the queue all

29
00:02:22,633 --> 00:02:29,190
the unmarked vertices that are adjacent.
So the process two, we have to check zero,

30
00:02:29,190 --> 00:02:33,791
one, three, and four.
We check zero, that's already marked, so

31
00:02:33,791 --> 00:02:39,100
we don't do anything.
We check one, that's also already marked,

32
00:02:39,100 --> 00:02:43,120
so we don't do anythin. in fact, it's on
the queue.

33
00:02:43,347 --> 00:02:49,718
We check three, and that one is unmarked
so we mark it and add it to the queue.

34
00:02:49,946 --> 00:02:55,634
And then we check four, that one's
unmarked, so we mark it and add it to the

35
00:02:55,634 --> 00:03:03,105
queue.
So by the way I didn't mention but we're

36
00:03:03,105 --> 00:03:09,432
also keeping track of two, auxiliary data
structures for this.

37
00:03:09,432 --> 00:03:13,663
One is the edge two array, which is the
same as before.

38
00:03:13,886 --> 00:03:18,044
What edge did we use to get to this, so
four.

39
00:03:18,044 --> 00:03:22,421
And we got to four from two.
And two, we got to two from zero.

40
00:03:22,424 --> 00:03:27,769
So again, that's going to be a tree that
gives us a path back to the source.

41
00:03:27,992 --> 00:03:33,931
And instead of marked, we also keep, we
keep a more, detailed information, which

42
00:03:33,931 --> 00:03:39,500
is, the length of the path, cuz it's, we
do it, 'cause it's easy to do it.

43
00:03:39,726 --> 00:03:44,562
Okay, so four, we checked four, and added
it to the queue,

44
00:03:44,788 --> 00:03:50,077
And now we're done with two.
So now we have one, five, three, and four

45
00:03:50,077 --> 00:03:54,006
all on the queue,
And we're going to process them.

46
00:03:54,233 --> 00:04:00,580
And since we've marked everything all
we're going to be doing now is checking

47
00:04:00,580 --> 00:04:05,415
vertices that are marked.
So for one, we check zero and that's

48
00:04:05,415 --> 00:04:08,664
marked.
Then we check two and that's marked,

49
00:04:08,891 --> 00:04:14,150
So then we're done with one.
Next thing off the queue is five, And we

50
00:04:14,150 --> 00:04:19,025
check three and that's marked, and we
check zero and that's marked, so we're

51
00:04:19,025 --> 00:04:23,440
done with five.
And then three, we got to check five, and

52
00:04:23,440 --> 00:04:29,405
then four, and then two and they're all
marked, and now we're done with three.

53
00:04:29,405 --> 00:04:34,737
And then finally the last one,
Always the last one everybody else has

54
00:04:34,737 --> 00:04:38,755
marked,
So connected, check three, Check two, and

55
00:04:38,755 --> 00:04:44,879
it's marked and we're done.
So what this process the result of this

56
00:04:44,879 --> 00:04:51,841
computation again is a three rooted at the
source and we can follow back through the

57
00:04:51,841 --> 00:04:56,202
three to get past from each node to the
source.

58
00:04:56,454 --> 00:05:03,835
Not only that, we can get the distance the
number of edges on the path from each node

59
00:05:03,835 --> 00:05:07,768
to the source.
So that's a demo of breadth-first search.

60
00:05:07,956 --> 00:05:11,780
And next we'll take a look at properties,
Of this algorithm.

61
00:05:11,780 --> 00:05:16,489
Alright. So now we've considered two
different methods for processing all

62
00:05:16,489 --> 00:05:20,230
vertices in a graph.
And actually, they're quite closely

63
00:05:20,230 --> 00:05:23,907
related, even though the computations are
quite different.

64
00:05:23,907 --> 00:05:28,294
Essentially, breadth-first search uses
recursion to corresponds to putting

65
00:05:28,294 --> 00:05:32,294
unvisited vertices on a stack.
Breadth-first search explicitly, we put

66
00:05:32,294 --> 00:05:37,764
the invert, visited vertices on the queue.
And actually, breadth-first search solves,

67
00:05:37,980 --> 00:05:42,809
another problem that, you know, often we
want to solve, called the shortest path

68
00:05:42,809 --> 00:05:46,340
problem.
Actually, the path that you get back from

69
00:05:46,340 --> 00:05:52,321
breadth-first search is the path from the
source to the given vertex that uses the

70
00:05:52,321 --> 00:05:57,293
fewest number of edges.
And we'll look at that, in just a minute.

71
00:05:57,510 --> 00:06:03,347
And the idea is that the breadth-first
search examines the vertices in the graph

72
00:06:03,347 --> 00:06:08,680
and increasing distance from the source.
So, let's take a look at that.

73
00:06:08,680 --> 00:06:12,653
So breadth-first, breadth-first search
computes shortest path,

74
00:06:12,653 --> 00:06:17,817
That is it builds the data structure that
we can answer the shortest path queries

75
00:06:17,817 --> 00:06:21,989
from the source with,
From S to all other vertices in the graph

76
00:06:21,989 --> 00:06:27,353
and time proportial to E plus V, then
[inaudible] vertices.

77
00:06:27,551 --> 00:06:33,246
And so let's look at why that is the case.
So the first thing is, how does, how do we

78
00:06:33,246 --> 00:06:40,541
know it competes, computes shortest paths?
Well, the intuition is and, you, you can

79
00:06:40,541 --> 00:06:45,720
fill in the details, the queue always
contains,

80
00:06:45,972 --> 00:06:52,215
Some vertices of distance K from the
source followed by some vertices of

81
00:06:52,215 --> 00:06:56,999
distance K plus one.
So they're on a queue we're processing

82
00:06:56,999 --> 00:07:03,758
them in first in first out order.
So say we're at a state when all of these

83
00:07:03,758 --> 00:07:09,728
vertices are, are on the queue. we're
going to have process vertex zero, as soon

84
00:07:09,728 --> 00:07:14,730
as we get this one we'll delete vertex
zero from the queue, and then when we're

85
00:07:14,730 --> 00:07:18,940
putting these adjacent ones on, we're
adding the ones of distance two.

86
00:07:18,940 --> 00:07:23,515
But we're not going to process any of
those until we're done with the ones of

87
00:07:23,515 --> 00:07:27,603
distance one and so forth.
So it's not hard to show that always you

88
00:07:27,603 --> 00:07:32,849
have either one of the two distances from,
from the source on the queue, and that

89
00:07:32,849 --> 00:07:37,730
means the first time we get to a vertex,
that's the shortest path to that vertex.

90
00:07:37,730 --> 00:07:43,521
And again, the running time we only visit
vertices once cuz we mark them.

91
00:07:43,521 --> 00:07:50,277
So it's time proportional to the number of
vertices plus the number of edges in the

92
00:07:50,277 --> 00:07:55,181
graph.
So that's breadth-first search properties

93
00:07:55,181 --> 00:08:03,841
and then here's the implantation, which is
simply code for the basic method that we

94
00:08:03,841 --> 00:08:09,302
outlined in pseudocode.
So our private instance variables are

95
00:08:09,302 --> 00:08:16,350
marked or in the demo, we used this to,
But just for simplicity, let's use marked

96
00:08:16,350 --> 00:08:22,340
edge two, then, is how we get to the first
vertex and then the source.

97
00:08:22,340 --> 00:08:28,859
And so, we have a constructor that builds
those arrays same way as before and then

98
00:08:28,859 --> 00:08:31,996
calls BFS.
So BFS builds a queue,

99
00:08:32,237 --> 00:08:37,636
That's what it's going to use.
Then queues the source and marks the

100
00:08:37,636 --> 00:08:41,257
source.
And then, this is just in code what we

101
00:08:41,257 --> 00:08:45,280
said in words before, while the queue is
not empty,

102
00:08:45,536 --> 00:08:49,991
We pull off the next vertex from the
queue, call it V.

103
00:08:49,991 --> 00:08:56,795
For everybody adjacent to V,
We go ahead and check if it's marked, we

104
00:08:56,795 --> 00:09:03,807
ignore it and move to the next.
If it's not marked then we put it on the

105
00:09:03,807 --> 00:09:11,359
queue mark it, and remember the edge.
And again this is an example of the power

106
00:09:11,359 --> 00:09:16,933
of extraction and the utility of our
design pattering, pattern.

107
00:09:17,203 --> 00:09:23,807
All we're doing in terms of the graph data
type is being a client to go through all

108
00:09:23,807 --> 00:09:27,711
the adjacent vertices.
But it allows us to implement this

109
00:09:27,711 --> 00:09:32,135
completely different algorithm in, in, in,
really an accessible way.

110
00:09:32,135 --> 00:09:35,648
So that's the implementation of
breadth-first search.

111
00:09:35,648 --> 00:09:40,983
Then the client for getting the pass back
as be saying this as for, as for

112
00:09:40,983 --> 00:09:44,892
breadth-first search.
So here's an old application of

113
00:09:44,892 --> 00:09:48,549
breadth-first search in, in computer
networks.

114
00:09:48,549 --> 00:09:53,964
That's very important when you're
communicating from one place to another,

115
00:09:53,964 --> 00:09:57,410
you want to get there in the fewest number
of hops.

116
00:09:57,617 --> 00:10:02,331
This is the Arpanet.
The, the predecessor of the, internet, as

117
00:10:02,331 --> 00:10:06,005
of July 1977.
And things were slow and computers were,

118
00:10:06,005 --> 00:10:11,065
computers were small and slow.
It's important to do these things in small

119
00:10:11,065 --> 00:10:15,086
number of hops.
And with breadth-first search, you could

120
00:10:15,086 --> 00:10:20,562
take this graph and figure out, the
shortest way to get from, one place to

121
00:10:20,562 --> 00:10:23,681
another.
That's a typical application of

122
00:10:23,681 --> 00:10:28,193
breadth-first search.
And here's another one,

123
00:10:28,421 --> 00:10:34,265
So called Kevin Bacon numbers.
And nowadays actually you can type Bacon

124
00:10:34,492 --> 00:10:38,287
and an actor's name,
And get the answer to this.

125
00:10:38,515 --> 00:10:45,573
So there's if, if you're not, not familiar
with it you can become familiar with it by

126
00:10:45,573 --> 00:10:50,643
Kevin Bacon numbers.
The idea is you have a graph, where, the

127
00:10:50,643 --> 00:10:57,962
vertices are actors, and the edge, and
think of an edge connecting two actors, if

128
00:10:57,962 --> 00:11:04,719
they are in a movie together. and so what
you wabt to find is given an actor how,

129
00:11:04,960 --> 00:11:12,360
what's the shortest way to get to Kevin
Bacon, connected by, so, edges were actors

130
00:11:12,360 --> 00:11:17,548
and edges were movies,
In connection of actors in the movie.

131
00:11:17,548 --> 00:11:24,386
So Buzz Mauro and Tatiana Ramirez were in
Sweet Dreams together, and these two

132
00:11:24,386 --> 00:11:28,705
actors were in this movie together and so
forth.

133
00:11:28,705 --> 00:11:33,024
And you get a way to Kevin Bacon from any
actor.

134
00:11:33,024 --> 00:11:38,962
And this is another pop culture
application, this is so called six

135
00:11:38,962 --> 00:11:44,450
degrees, which you can get to anyone in
six steps in this way.

136
00:11:44,450 --> 00:11:50,020
So that's all implementation of.
Breadth first search on the Kevin Bacon

137
00:11:50,020 --> 00:11:52,913
graph,
Where we include one vertex for each

138
00:11:52,913 --> 00:11:55,335
performer,
One vertex for each movie.

139
00:11:55,335 --> 00:12:00,918
Connect the movie to all performers that
appear in the aovie And the shortest path

140
00:12:00,918 --> 00:12:06,300
from Kevin Bacon to every actor, if you
follow through, back through that path,

141
00:12:06,501 --> 00:12:11,708
you get.
To. You, you get the proof of the Kevin

142
00:12:11,708 --> 00:12:18,630
Bacon number for each actor and we have
implementation of that on the book side.

143
00:12:18,912 --> 00:12:25,229
So that's another example.
And actually there's maybe even older or

144
00:12:25,229 --> 00:12:32,300
similar age example that mathematicians
are fond of and that is called the,

145
00:12:32,300 --> 00:12:37,903
So called Erdos number.
So in this one it's mathematicians, the

146
00:12:37,903 --> 00:12:42,371
nodes are mathematicians,
And there's an edge if the two

147
00:12:42,371 --> 00:12:49,489
mathematicians have co-authored a paper.
And Paul Erdos was a, a very productive

148
00:12:49,489 --> 00:12:56,228
Hungarian mathematician who traveled the
world co authoring papers with people all

149
00:12:56,228 --> 00:13:00,393
over the world.
A, a very interesting and prolific

150
00:13:00,393 --> 00:13:06,300
character, who actually did quite a bit of
research on properties of graphs.

151
00:13:06,517 --> 00:13:13,415
And maybe even more so than Kevin Bacon.
The idea is he was so prolific that pretty

152
00:13:13,415 --> 00:13:18,280
much every mathematician has a pretty low,
Erdos number.

153
00:13:18,280 --> 00:13:25,290
So that's our second example of a
graph-processing algorithm breadth-first