So here's the Pregel model.
It's an internal project which is used
within Google.
It distributes vertices to processors or
nodes in this diagram, initialization you
partition the vertices across the
processors
and then vertices are given edges.
So then they read edges and then edges are
given to the appropriate
vertex, so vertex will get all the edges
which outgo from that vertex.
And each processor will have a whole bunch
of vertices.
And then
vertices will exchange messages as per the
Pregel program, that is written by
[UNKNOWN]
application.
The nodes will some messages will be
exchanged within a node itself when the
destination vertex is within that node and
others will have to travel between nodes.
When the destination vertex is actually on
another processor.
This coordination of messages precedes in
what I call super steps,
we will come to that in the later chart.
So, after all nodes have sort of delivered
and received one round of messages.
The master ensures that there's a
synchronization across all processors.
And then the next step proceeds.
So everybody's sends messages and then the
messages get delivered.
Then again it does some computation.
There's, everybody sends messages,
messages get delivered
and so on and so forth until there are no
more messages getting delivered.
We'll, we'll see examples of algorithms
implemented using
this Pregel model in, in a, in a bit.
There are different implementations,
Pregel itself is
internal to Google as we mentioned just
a minute ago.
Giraph is an open source implementation of
the
Pregel model built on top of Hadoop, but
there's,
they're essentially only mappers and not
reducers and all
the message parsing communication happens
through remote procedure calls.
It, like Pregel itself, Giraph distributes
vertices across nodes.
GPS is another,
implementation of the, of the graph model,
based on the Pregel paradigm.
It also distributes vertices at, across
processors,
but it also takes care of appropriately
repartitioning the vertices between
processors to make
sure there is less communication, between
processors.
So this is an important additional,
feature that GPS does, for you whereas
in Giraffe you have to do anything like
that yourself.
GraphLab is yet another, graph processing
model.
very similar, from the programming
perspective, to Pregel, in terms of every
vertex is a program, but, in here, edges
are distributed across nodes.
So the information about a single vertex
is actually distributed across processors.
this is quite
an interesting model.
Obviously the architecture is very
different from, from this one,
because you're not distributing nodes, to
vertices to nodes, but edges.
We won't go to the great details of the
implementation architectures,
but will look at a few applications of the
Pregel model.
to some interesting problems.
So, here is the Pregel programming model.
Just like we had the map produce
programming model where you had to write a
map function and
a reduced function, here we have to write
code for every vertex.
Each vertex executes code once the graph
is noded and the assumption is that
the process of loading the graph makes
sure that each vertex knows it's outgoing
edges.
It may not have any idea of what it's
incoming edge is, but it knows its own
outgoing edges.
Vertices are distributed across
processors, so each
processor executes the code for all its
vertices.
So, it doesn't update for every vertex
that it, has.
in what is called a superstep.
After the superstep, is completed, a
second superstep starts.
And so on until the program halts, as we
shall see in a minute.
So each vertex first receives messages
from all its in-neighbors.
So, who where is sending those messages
on its incoming edges, that's some
computation
using those, these messages, and then
decides
to send some messages to its
out-neighbors, possibly.
After that, it decides whether or not to
halt.
Or it means, what a halt means is that a
vertex stays
inactive once it's halted.
Unless it receives a message from some
other vertex in the future.
If this does not happen, and all vertices
are halted, then the computation halts.
So the master, as you mentioned in the
previous, chart, is responsible for making
sure that the messages are sent and
received from vertices,
as well as check it's, where there are
not, all the vertices are in halt state.
If they are, then it ends the computation.
So let's do a couple of, more examples
using
the Pregel programming model to solve a
few important problems.
A simple one is called max-node.
We're essentially given
values across all the vertices, we would
like to find out the maximum
of these and make sure that every vertex
at the end
has the maximum value across all the
vertices in the graph.
So this is a simple program.
You send your value along your outgoing
edges
and then start receiving values from
incoming edges.
For each incoming message, if you get a
value
which is greater than your value then you
replace
yourself with that value.
And and continue and don't, don't halt but
if, if you never
got, if you did not get any message which
was greater than your value then you halt.
Let's see what happens if this is executed
for every node.
In the first step what happens is that
these
are bi directional edges, which are
mentioned over here.
So this means there's an edge
from three to six as well as from six to
three.
In the first step three comes to six, six
doesn't do anything be because it's
already the maximum and it's never going
to get a message greater than itself.
But three becomes a six, right?
Similarly two gets a message from one.
but it doesn't change because one
is less than two.
But one gets six, so it changes.
So, six and two did not change.
And therefore they halt.
So they're shown as shaded.
So it sends so one sends its value to six,
six does nothing with it
six sends its value to two, which now
wakes up because it, it was
sleeping, it was halted, but it got an
incoming message so it wakes up and
this super, next super step and checks
that, yes it did get a new value.
So it becomes six and then it sends it's
value out to it's outgoing edges
which doesn't do anything over there.
So at the end, all the vertices are still,
are halted, the computation
ends, and as you can see, all the vertices
have the maximum value of six.