1
00:00:00,300 --> 00:00:04,240
So here's the Pregel model.

2
00:00:04,240 --> 00:00:08,040
It's an internal project which is used
within Google.

3
00:00:08,040 --> 00:00:16,220
It distributes vertices to processors or
nodes in this diagram, initialization you

4
00:00:16,220 --> 00:00:19,060
partition the vertices across the
processors

5
00:00:19,060 --> 00:00:22,662
and then vertices are given edges.

6
00:00:22,662 --> 00:00:25,680
So then they read edges and then edges are
given to the appropriate

7
00:00:25,680 --> 00:00:32,170
vertex, so vertex will get all the edges
which outgo from that vertex.

8
00:00:32,170 --> 00:00:34,995
And each processor will have a whole bunch
of vertices.

9
00:00:36,830 --> 00:00:37,530
And then

10
00:00:39,650 --> 00:00:45,175
vertices will exchange messages as per the
Pregel program, that is written by

11
00:00:45,175 --> 00:00:45,450
[UNKNOWN]

12
00:00:45,450 --> 00:00:46,740
application.

13
00:00:46,740 --> 00:00:51,560
The nodes will some messages will be
exchanged within a node itself when the

14
00:00:51,560 --> 00:00:58,260
destination vertex is within that node and
others will have to travel between nodes.

15
00:00:58,260 --> 00:01:02,420
When the destination vertex is actually on
another processor.

16
00:01:03,520 --> 00:01:10,645
This coordination of messages precedes in
what I call super steps,

17
00:01:10,645 --> 00:01:14,825
we will come to that in the later chart.

18
00:01:14,825 --> 00:01:20,530
So, after all nodes have sort of delivered
and received one round of messages.

19
00:01:20,530 --> 00:01:25,920
The master ensures that there's a
synchronization across all processors.

20
00:01:25,920 --> 00:01:29,180
And then the next step proceeds.

21
00:01:29,180 --> 00:01:33,030
So everybody's sends messages and then the
messages get delivered.

22
00:01:33,030 --> 00:01:34,908
Then again it does some computation.

23
00:01:34,908 --> 00:01:39,669
There's, everybody sends messages,
messages get delivered

24
00:01:40,780 --> 00:01:45,230
and so on and so forth until there are no
more messages getting delivered.

25
00:01:46,500 --> 00:01:50,100
We'll, we'll see examples of algorithms
implemented using

26
00:01:50,100 --> 00:01:54,030
this Pregel model in, in a, in a bit.

27
00:01:55,460 --> 00:01:57,940
There are different implementations,
Pregel itself is

28
00:01:57,940 --> 00:02:00,340
internal to Google as we mentioned just

29
00:02:00,340 --> 00:02:01,690
a minute ago.

30
00:02:01,690 --> 00:02:05,530
Giraph is an open source implementation of
the

31
00:02:05,530 --> 00:02:08,790
Pregel model built on top of Hadoop, but
there's,

32
00:02:08,790 --> 00:02:12,660
they're essentially only mappers and not
reducers and all

33
00:02:12,660 --> 00:02:15,650
the message parsing communication happens
through remote procedure calls.

34
00:02:17,060 --> 00:02:21,710
It, like Pregel itself, Giraph distributes
vertices across nodes.

35
00:02:23,420 --> 00:02:24,630
GPS is another,

36
00:02:26,035 --> 00:02:31,850
implementation of the, of the graph model,
based on the Pregel paradigm.

37
00:02:31,850 --> 00:02:34,120
It also distributes vertices at, across
processors,

38
00:02:34,120 --> 00:02:37,600
but it also takes care of appropriately

39
00:02:37,600 --> 00:02:41,740
repartitioning the vertices between
processors to make

40
00:02:41,740 --> 00:02:45,260
sure there is less communication, between
processors.

41
00:02:45,260 --> 00:02:51,320
So this is an important additional,
feature that GPS does, for you whereas

42
00:02:51,320 --> 00:02:54,685
in Giraffe you have to do anything like
that yourself.

43
00:02:54,685 --> 00:03:00,010
GraphLab is yet another, graph processing
model.

44
00:03:00,010 --> 00:03:05,070
very similar, from the programming
perspective, to Pregel, in terms of every

45
00:03:05,070 --> 00:03:11,180
vertex is a program, but, in here, edges
are distributed across nodes.

46
00:03:11,180 --> 00:03:14,850
So the information about a single vertex
is actually distributed across processors.

47
00:03:15,860 --> 00:03:16,340
this is quite

48
00:03:16,340 --> 00:03:17,450
an interesting model.

49
00:03:17,450 --> 00:03:20,730
Obviously the architecture is very
different from, from this one,

50
00:03:20,730 --> 00:03:26,010
because you're not distributing nodes, to
vertices to nodes, but edges.

51
00:03:27,170 --> 00:03:30,020
We won't go to the great details of the
implementation architectures,

52
00:03:30,020 --> 00:03:34,690
but will look at a few applications of the
Pregel model.

53
00:03:34,690 --> 00:03:40,420
to some interesting problems.
So, here is the Pregel programming model.

54
00:03:40,420 --> 00:03:41,620
Just like we had the map produce

55
00:03:41,620 --> 00:03:44,224
programming model where you had to write a
map function and

56
00:03:44,224 --> 00:03:48,890
a reduced function, here we have to write
code for every vertex.

57
00:03:50,260 --> 00:03:55,370
Each vertex executes code once the graph
is noded and the assumption is that

58
00:03:55,370 --> 00:03:57,980
the process of loading the graph makes

59
00:03:57,980 --> 00:04:01,220
sure that each vertex knows it's outgoing
edges.

60
00:04:01,220 --> 00:04:02,740
It may not have any idea of what it's

61
00:04:02,740 --> 00:04:05,870
incoming edge is, but it knows its own
outgoing edges.

62
00:04:06,960 --> 00:04:09,610
Vertices are distributed across
processors, so each

63
00:04:09,610 --> 00:04:13,260
processor executes the code for all its
vertices.

64
00:04:13,260 --> 00:04:18,690
So, it doesn't update for every vertex
that it, has.

65
00:04:19,730 --> 00:04:21,110
in what is called a superstep.

66
00:04:22,720 --> 00:04:28,310
After the superstep, is completed, a
second superstep starts.

67
00:04:28,310 --> 00:04:31,950
And so on until the program halts, as we
shall see in a minute.

68
00:04:33,410 --> 00:04:38,826
So each vertex first receives messages
from all its in-neighbors.

69
00:04:38,826 --> 00:04:40,380
So, who where is sending those messages

70
00:04:40,380 --> 00:04:43,420
on its incoming edges, that's some
computation

71
00:04:43,420 --> 00:04:46,250
using those, these messages, and then
decides

72
00:04:46,250 --> 00:04:48,860
to send some messages to its
out-neighbors, possibly.

73
00:04:50,810 --> 00:04:52,560
After that, it decides whether or not to
halt.

74
00:04:54,000 --> 00:04:58,460
Or it means, what a halt means is that a
vertex stays

75
00:04:58,460 --> 00:05:00,990
inactive once it's halted.

76
00:05:00,990 --> 00:05:04,430
Unless it receives a message from some
other vertex in the future.

77
00:05:05,640 --> 00:05:11,160
If this does not happen, and all vertices
are halted, then the computation halts.

78
00:05:11,160 --> 00:05:16,540
So the master, as you mentioned in the
previous, chart, is responsible for making

79
00:05:16,540 --> 00:05:23,550
sure that the messages are sent and
received from vertices,

80
00:05:24,870 --> 00:05:29,710
as well as check it's, where there are
not, all the vertices are in halt state.

81
00:05:29,710 --> 00:05:32,370
If they are, then it ends the computation.

82
00:05:32,370 --> 00:05:35,460
So let's do a couple of, more examples
using

83
00:05:35,460 --> 00:05:38,669
the Pregel programming model to solve a
few important problems.

84
00:05:40,050 --> 00:05:45,110
A simple one is called max-node.
We're essentially given

85
00:05:45,110 --> 00:05:50,330
values across all the vertices, we would
like to find out the maximum

86
00:05:50,330 --> 00:05:53,690
of these and make sure that every vertex
at the end

87
00:05:53,690 --> 00:05:57,280
has the maximum value across all the
vertices in the graph.

88
00:05:58,550 --> 00:06:00,730
So this is a simple program.

89
00:06:00,730 --> 00:06:05,960
You send your value along your outgoing
edges

90
00:06:05,960 --> 00:06:10,190
and then start receiving values from
incoming edges.

91
00:06:10,190 --> 00:06:13,430
For each incoming message, if you get a
value

92
00:06:13,430 --> 00:06:15,390
which is greater than your value then you
replace

93
00:06:15,390 --> 00:06:16,652
yourself with that value.

94
00:06:16,652 --> 00:06:23,330
And and continue and don't, don't halt but
if, if you never

95
00:06:23,330 --> 00:06:28,640
got, if you did not get any message which
was greater than your value then you halt.

96
00:06:28,640 --> 00:06:31,940
Let's see what happens if this is executed
for every node.

97
00:06:32,990 --> 00:06:35,762
In the first step what happens is that
these

98
00:06:35,762 --> 00:06:38,931
are bi directional edges, which are
mentioned over here.

99
00:06:38,931 --> 00:06:40,456
So this means there's an edge

100
00:06:40,456 --> 00:06:42,800
from three to six as well as from six to
three.

101
00:06:42,800 --> 00:06:51,190
In the first step three comes to six, six
doesn't do anything be because it's

102
00:06:51,190 --> 00:06:55,590
already the maximum and it's never going
to get a message greater than itself.

103
00:06:55,590 --> 00:06:57,550
But three becomes a six, right?

104
00:06:58,560 --> 00:07:05,970
Similarly two gets a message from one.
but it doesn't change because one

105
00:07:05,970 --> 00:07:09,890
is less than two.
But one gets six, so it changes.

106
00:07:09,890 --> 00:07:12,720
So, six and two did not change.

107
00:07:12,720 --> 00:07:16,430
And therefore they halt.
So they're shown as shaded.

108
00:07:36,470 --> 00:07:40,880
So it sends so one sends its value to six,
six does nothing with it

109
00:07:42,155 --> 00:07:47,560
six sends its value to two, which now
wakes up because it, it was

110
00:07:47,560 --> 00:07:51,230
sleeping, it was halted, but it got an
incoming message so it wakes up and

111
00:07:51,230 --> 00:07:55,970
this super, next super step and checks
that, yes it did get a new value.

112
00:07:55,970 --> 00:08:01,580
So it becomes six and then it sends it's
value out to it's outgoing edges

113
00:08:01,580 --> 00:08:03,130
which doesn't do anything over there.

114
00:08:03,130 --> 00:08:08,650
So at the end, all the vertices are still,
are halted, the computation

115
00:08:08,650 --> 00:08:13,600
ends, and as you can see, all the vertices
have the maximum value of six.