We've heard about page-rank in the very early lectures, as a way of finding which web page is most important. Page-rank is really a property of a graph and the graph structure. this is an example of one of the ways of computing the page rank, and it's a bit complicated, so I'll go through it a bit slowly. So the goal is to compute the page-rank for every node in a vertex, so there, there's, there are n vertices, in the, in a graph then, there'll be n values pi. One for every vertex. Further, every vertex has, l outgoing edges. So l of j is the number of outgoing edges that vertex j has. Now, the, the way you compute page-rank, one of the ways is, you start off with some random values at every vertex for p, just make everything one. Or, or, or half, or, or typically 1 by N is taken to the starting point and then you can, you, you, you do this kind of an update. So, you p of i at the next step becomes p of i times a factor. Typically d is something like 0.85, so this becomes 0.15 over N. So it's p of i multiplied by some factor, plus d which is say 0.85 times the sum of the page ranks of all the vertices that point to i. That's j is pointing to i, divide by the number of links outgoing from j. So there's a vertex j which is pointing to i, which has a huge number of outgoing vertices, so the i is very unimportant for that vertex j. It's just one of millions of outgoing vertices for j, then its page rank, when it flows into j is not considered very important. However, if i is the only vertex that j points to then you know l is one and it, it's importance is much higher as it flows in to to i. So that's the, the intuition that you know, if I'm, if I'm being linked to from a page it's importance flows to me in a proportion to how important it thinks I am. Which is measured by what fraction of the total edges. That it points outwards to mi, that means, you know I just divided it. It, it divides its page rank equally along all its outgoing edges and, and I get that fraction of it. So if you can, if you, if you, if you iterate this again and again Ultimately these p's converge so they all settle down into a fixed value so they don't change after some time. and that value is the page rank of vertex i and similarly the same page rank of vertex j, etcetera. It's also global literation, because all the vertices are exchanging messages. Now let's see how you would do this in Pregel. To find the vertex, I have some estimate of my page rank, initially some random value. I, I send my value divided by the l, which is the number outgoing edges that I have, I send it to everybody that I can see on, on, an outgoing edge, and then I look for incoming messages. That would come from other vertices and then conduct this iteration. In other words, I collect all the incoming messages from all my incoming edges, and I simply add them up, multiply them by t and add them into one minus d by N. Times p and replace p with this value as per this formula. After a fixed number of such super steps, for example 30 which is often sufficient when we didn't stop and the page rank should have converted. On the other hand we could conduct convergence test to make sure that successive p values are hardly changing at all. But then we have to do a little bit more than is provided by the Pregel model that we have covered so far. which is that, we have to see whether the maximum difference between successive values across all vertices is still small. And that is the process of aggregation. Pregel allows for aggregation, we're not going to cover those, that feature much in detail here I'll cover it shortly. to see how this is this works, there's a sequential implementation of Pregel or the Pregel model, a very simple sequential implementation at this location. And I'll show you that code and how it computes the page rank. And I encourage you to study that and modify to compute a few other things to get to know Pregel a bit better. Here is the lightweight sequential simulation of Pregel from the link that I just mentioned. As you can see it's a toy single-machine version of Google's Pregel paradigm. clearly it's not for large scale processing, it's just a simulation. Let's see how it works. It has a class called Vertex, which has, it's obviously various things, including the value of the vertex, as well as a list of outgoing vertices. That means the ID, the IDs of the outgoing vertices. It also keeps track of things like incoming messages, and the number of supersteps being performed. Then Pregel, which is the class which does all the work. first sort of does things like partitioning the vertices across across different workers and each worker actually is, performing super steps. So if you look at the super step itself super step essentially looks at every partition. And tells the worker to do its work in every partition and the worker essentially ends up, running a super step and the super step in the end. After all this class hierarchy essentially for all the vertexes that it happens to own in that partition it cause vertex to update. There is no function vertex update provided in this class, and that's the function that you as a programmer have to write. By subclassing vertex, with a class of your own and and, and providing that update function. I won't go into the detail implementation of this, I mean those of you who know a little bit of object-oriented programming, it's fairly straight forward to figure out what's happening in this code. Or what we will do is look at how this basic Pregel module is been used to stimulate the page rank program, which we just described in our charts. So if we look at the page rank program using this module essentially import this module. and, then you simply write a page rank vertex class with subclasses vertex and or inherits from vertex and then it, you implement the update function. So what does the update function do? Very similar to our chart. if the number of super steps are less than 50 it changes its own value 2.5 by n times 0.85, times sum of everything that it gets from the incoming vertices. And then it sends it's own page rank value out after dividing by l. So very simple. And then it stops, after 50 steps it stops. There's no convergence test in this, this example. So when you when you, when you run this the way you run this is essentially called this program, it says. with instance of pregel, with the number of ver, with the, with the list of vertices and the number of workers that you're trying to simulate, then you say run and and then return the values of all the vertices. Right. the main program essentially does, just creates vertices, creates edges. There's a, there's a function below there which creates a random matrix, a random graph of ten vertices. With each vertex having four random edges and then you simply, say page rank Pregel which is this function on the number of vertices and we're done. Let's see how that actually works. If I call that, a program it computes, the page rank of a random craft. I have no idea whether this is a good page rank. so there is actually a test provided. So what the test does is it uses a closed form formula for the page rank which is not an easy formula to compute, because it involves the inverse of a matrix. But, it's there on Wikipedia if you want to look it up. it, it computes the test page rank and then compares the page rank computed using Pregel to the test page rank and looks at the difference. So if you print, print, so if you learn this program after printing the, after uncommenting the the test part. You see that the difference between the two page rank vectors is very small. So the, this one was the Pregel computation upgraded rank, the test computation which is using the closed form formula and we get the right answer. Returning now to the Pregel module itself one interesting exercise that some of you might want to try is see whether you can actually write a parallel version of this of this program. Possibly building on some of the the way mince meat.python actually built a lightweight parallel implementation of map reduce. That would be as an interesting contribution to the open source community, if any of you are interested to try it out.