Let's look at one final problem and solve it in pregel, which is the problem of finding the shortest path from a particular source vertex in the graph, such as the left-most vertex here, to every other vertex in the graph. We also assume that in general, the edges can have weights, so we would like to compute the shortest path, taking into account the distance traveled along every edge, but for the moment, let's assume that these weights are just 1. So, here's how we compute the shortest path to every vertex from the source in pregel. Each vertex has an estimate of its shortest path. It starts off at infinity, and it waits for messages from its neighboring nodes via incoming edges. And if the estimate that it gets from its incoming messages is less than its current estimate replaces itself by the new estimate. And then what, what it sends out is its current estimate plus the edge weight of every outgoing edge. So, if we do that, then the messages work out to be the correct ones, and eventually we get the shortest path from the source to every vertex sitting inside the vertex values. So, at the end, every vertex ends up having the shortest distance from the source to itself. I'll leave it to you to homework to figure out now how one can compute the shortest path from the source to every vertex using this information. Let's walk through this example step-by-step. Initially, all the vertices have infinity. The source vertex figures out that it's, it is the source. And its m becomes 0 and therefore it assigns itself a new value which is 0. And sends out 0 plus e, which is just 1, so 0 plus 1, out on its outgoing edge. Every other vertex does nothing, and at the end all the vertices halt at the end of the super step. In the second super step, all the vertices are halted, but the second vertex over here gets a message from its neighbor and therefore, it wakes up and checks whether that message is less than infinity, it is, and it assigns itself to be 1. And sends out 1 plus 1, which is 2, on this outgoing edge. Similarly, in the third step, initially, everybody is asleep. This vertex gets something, a 2 on its incoming edge. Figures out it's less than infinity, assigns itself to 2, and sends out 3 on its outgoing edge. And finally the middle vertex would be this vertex over here assigns itself a 3, and then goes to sleep. It sends out before that, 3 plus 1, 4 on both these edges, but that doesn't cause any change to either of these vertices the second one and the last one and therefore they don't send any further messages out. and no further computation takes place because there are no messages flowing to wake nodes up, everybody's asleep. And the computation halts. Shortest path computations like this are quite useful in many tasks including planning. For example, think of a automated self-driving car trying to get from one end of a parking lot, O to its designated parking space, G navigating the the parking lot in a manner so that it doesn't bump into any existing parked cars. One can think of this as a graph problem with each cell being a vertex and the edges going from allowed parts from one vertex to another, according to this grid, avoiding the the parked cars. And then to find the shortest path from O to G one could just find the shortest path from O to every vertex and which includes G and then, one can just follow that path. Of course, this particular algorithm would find the shortest distance from O to G. And as I've already mentioned, it's a little piece of homework for you to figure out how to backtrack effectively, and find the shortest path from O to G. Or, for that matter, from the source to any other vertex. That's the end of our discussion of pregel. let's summarize and see what we've learned. pregel is very appropriate if the problem can be mapped to a graph. But the caveat is that all the data and especially the graph itself, with all it's edges, need to fit in total memory, across all the machines. If you can do that however, you avoid reloading data between iterations. And so, we often find out that the pregel implementation beats map reduce especially on things like page rank, shortest paths, et cetera. In fact, for the case, case of shortest paths, it's especially ba-, good because in page rank at least, every vertex was doing everything iteration, but in, in shortest parts. In the beginning phase only, the vertices near the, the origin or the goal are actually doing anything, and, and, over time the activity spreads throughout the graph. So a large part of the graph are not even active. So there's no point reloading the entire graph. as is done in map-reduce. Or, as it is required in map-reduce, is really adds to a lot of inefficiency. But the caveat for the pregel, an important part is that you need to properly distribute vertices to machines because they are there forever. I mean, in map-reduce at least you can rely on the fact that every time you're reloading things randomly, so hopefully most of the time you'll get things right. But in pregel you do it wrong once and, and, and you have an in-balanced computation and you're stuck for the entire set of iterations. And so, you need to do some kind of graph partitioning. As mentioned earlier, GPS is an implementation of the pregel model which does graph partitioning automatically as a first step in the iteration other, other cases, you do it yourself. Many important features of pregel of the pregel model we haven't covered. sometimes [COUGH] the messages that are being sent out from vertices to others are actually identical. so there's no point sending them again and again across processors. So messages which are actually identical that means their, their content is identical, they can be just grouped together [COUGH] and only their, their actual, their addresses can be stored and sent to the other processor. Or the, the destination vertices can be can be kept, kept track of. Second thing that we haven't covered is aggregation, that I mentioned earlier that how the aggregation allows vertices to compute a global function together. and that allows you to check for things like convergence. We haven't talked about that. And very important feature is called, is graph mutations. Vertices can actually add new vertices, new edges into the graph, and this is actually very useful for doing interesting things using the graph model. We won't talk about those here, but they're extreme interesting things which become very cumbersome and map-reduce get done very easily by using things like graph mutation.