Let's look at one final problem and solve
it in
pregel, which is the problem of finding
the shortest path
from a particular source vertex in the
graph, such as
the left-most vertex here, to every other
vertex in the graph.
We also assume that in general, the edges
can have weights, so we would like
to compute the shortest path, taking into
account
the distance traveled along every edge,
but for
the moment, let's assume that these
weights are just 1.
So, here's how we compute the shortest
path
to every vertex from the source in pregel.
Each vertex has an estimate of its
shortest path.
It starts off at infinity, and it waits
for messages from its neighboring nodes
via incoming edges.
And if the estimate that it gets from its
incoming
messages is less than its current estimate
replaces itself by the new estimate.
And then what, what it sends out is its
current estimate plus the edge weight of
every outgoing edge.
So, if we do that, then the messages work
out to be the correct ones, and
eventually we get the shortest path from
the
source to every vertex sitting inside the
vertex values.
So, at the end, every vertex ends up
having the shortest distance from the
source to itself.
I'll leave it to you to homework to figure
out now how one
can compute the shortest path from the
source to every vertex using this
information.
Let's walk through this example
step-by-step.
Initially, all the vertices have infinity.
The source vertex figures out that it's,
it is the source.
And its m becomes 0 and therefore it
assigns
itself a new value which is 0.
And sends out
0 plus e, which is just 1, so 0 plus 1,
out on
its outgoing edge.
Every other vertex does nothing, and
at the end all the vertices halt at the
end of the super step.
In the second super step, all the vertices
are halted, but
the second vertex over here gets a message
from its neighbor
and therefore, it wakes up and checks
whether that message is
less than infinity, it is, and it assigns
itself to be 1.
And sends out 1 plus 1, which is 2, on
this outgoing edge.
Similarly, in the third step,
initially, everybody is asleep.
This vertex gets something, a 2 on its
incoming edge.
Figures out it's less than infinity,
assigns itself to
2, and sends out 3 on its outgoing edge.
And finally the middle vertex would be
this vertex over
here assigns itself a 3, and then goes to
sleep.
It sends out before that, 3 plus 1, 4 on
both these edges, but that
doesn't cause any change to either of
these vertices the
second one and the last one and therefore
they don't send any further messages out.
and no further computation takes place
because there are
no messages flowing to wake nodes up,
everybody's asleep.
And the computation halts.
Shortest path computations like this are
quite useful in many
tasks including planning.
For example, think of a automated
self-driving car trying to get from one
end of a parking lot, O
to its designated parking space, G
navigating the the
parking lot in a manner so that it doesn't
bump into any existing parked cars.
One can think of this as a graph problem
with each cell
being a vertex and the edges going from
allowed parts from
one vertex to another, according to this
grid, avoiding the the parked cars.
And then to find the shortest path from O
to G one could just find the shortest path
from O to every vertex and which includes
G and then, one can just follow that path.
Of course,
this particular algorithm would find the
shortest distance from O to G.
And as I've already mentioned, it's a
little piece of homework for you to
figure out how to backtrack effectively,
and find the shortest path from O to G.
Or, for that matter, from the source to
any other vertex.
That's the end of our discussion of
pregel.
let's summarize and see what we've
learned.
pregel is very
appropriate if the problem can be mapped
to a graph.
But the caveat is that all the data and
especially the graph itself,
with all it's edges, need to fit in total
memory, across all the machines.
If you can do that however, you avoid
reloading data between iterations.
And so, we often find out that the pregel
implementation beats
map reduce especially on things like page
rank, shortest paths, et cetera.
In fact, for the case,
case of shortest paths, it's especially
ba-, good because in page rank at
least, every vertex was doing everything
iteration, but in, in shortest parts.
In the beginning phase only, the vertices
near the, the origin or the goal
are actually doing anything, and, and,
over
time the activity spreads throughout the
graph.
So a large part of the graph are not even
active.
So there's no point reloading the entire
graph.
as is done in map-reduce.
Or, as it is required in map-reduce, is
really adds to a lot of inefficiency.
But the caveat for the pregel, an
important part is that you
need to properly distribute vertices to
machines because they are there forever.
I mean, in map-reduce at least you can
rely on the fact that every
time you're reloading things randomly, so
hopefully
most of the time you'll get things right.
But in pregel you do it wrong once
and, and, and you have an in-balanced
computation and
you're stuck for the entire set of
iterations.
And so, you need to do some kind of graph
partitioning.
As mentioned earlier, GPS is an
implementation
of the pregel model which does graph
partitioning
automatically as a first step in the
iteration other, other cases, you do it
yourself.
Many important features of pregel of the
pregel model we haven't covered.
sometimes
[COUGH]
the messages that are being sent out from
vertices to others are actually identical.
so there's no point sending them again and
again across processors.
So messages which are actually identical
that means their,
their content is identical, they can be
just grouped together
[COUGH]
and only their, their actual, their
addresses can
be stored and sent to the other processor.
Or the, the destination vertices can be
can be kept, kept track of.
Second thing that we haven't covered is
aggregation, that I mentioned earlier
that how the aggregation allows vertices
to compute a global function together.
and that allows you to check for things
like convergence.
We haven't talked about that.
And very important feature is called, is
graph mutations.
Vertices can actually add new vertices,
new edges into the graph, and
this is actually very useful for doing
interesting things using the graph model.
We won't talk about those here,
but they're extreme interesting things
which become
very cumbersome and map-reduce get done
very
easily by using things like graph
mutation.