So let's see what happens when you try to iterate MapReduce based matrix vector multiplication. First time of course you have to read the matrix A, and read the vector X in parallel by all the mappers, then the intermediate state, rather after the first MapReduce phase, you'll write the Y prime. And then you have to read the Y primes, and then at the end you will write the Y's all in parallel, because every reducer computes a few rows of the the final Y. But now, you have to once again you have read the Matrix A and the Y's that you have written, in the previous iteration. Write Y prime, write Y again and continuously read A again and read Y. So, there're two places where you're reading things again and again. One is here your writing Y prime and, and reading Y prime again which you can potentially avoid by pipelining using technique called like HaLoop and other implementations. Where you can pipeline the writes and the reads from the, writes of one reducer to the reads of the next mapper, in the next phase. But the, we still have a problem having to read A, again. Notice MapReduce does not have any memory. It, the basic assumption is that the data is just too large, it will never fit in memory, and so you essentially have to read A again and again. But suppose it did fit in memory. MapReduce still has no memory, so it can't really remember A between iterations the mappers don't know what they're going to get next time. It's a random order of reads every time. And so MapReduce iterations are inherently inefficient in this regard. That if A is really a huge matrix, lots and lots of edges, or rather lots and lots of entries, then it's, it's just going to have to read it again and again. So this is definitely a problem, and there is a way around it, which is essentially leads us to [UNKNOWN]. Let's look at the same matrix that your operation now as a graph operation. And what do I mean by that? Think of this matrix as defining edges in a graph, so for example, the fact you have, the, the first node, which is the first column, or, has an edge which leaves it and goes to the second vertex. And similarly there's a edge going from the second vertex to the first vertex. So, read each column as the edges that go out of, say the first vertex, and read each row as the edges that come in to the first vertex for example. So, here you have minus one is just about X itself. Diagonals are all minus one, and since there's a vertex, there's an edge which comes in from two, and there's an edge which goes out from one. Now, let's take a look at three. It has two edges going out. One to four, which is this entry. And one to two, which is this entry. And it has one edge coming in from four, which is this edge over here. And you can verify that this particular matrix, which is called the adjacency matrix of this graph, represents all the edges in this graph. So, the diagram represents, just the vertex, and we treat them as minus one. for reasons which we won't go into in this course. and the off diagonal entries represent edges in this graph. Now what about the vector X itself? Well we assume each vertex holds a value, which is nothing but the element of the x vector in the appropriate position corresponding to that vertex. So the vector 1, 2, 3, 4, is distributed across the vertices of the graph as just 1, 2, 3, 4, which is one element per vertex. Now let's think of the matrix vector multiplication as being done in parallel by each vertex, but in the following way. Each vertex sends its value out, along all its outward edges. So two sends its value to these two vertices, and one sends it to this one and so on. And then all the vertexes start receiving the values that were sent in this step, and they receive values from all their neighbors on incoming edges, and finally the vertex replaces its value by the sum of everything it receives minus the value that it originally had, which is essentially as you can easily see, It performs exactly the matrix-vector multiplication that was going on. Now, the interesting part of this is a, it's very parallel in the sense that as long as you know, you have many processors, and each processor has bunches of vertices, you can really do a lot of this operation in parallel. The only caveat is that, you know, you have to exchange your messages, so you try to make sure that the vertices are placed so that they don't have to exchange messages between different processors too often, and that's a challenge, as we'll come to that in a short while. But the really important part of this is that it's easy to iterate. Once the graph's structure has been loaded into, all the processors so that every vertex knows its outgoing edges then you don't, don't need to load, this matrix structure again. You can just keep iterating, again and again, and repeating the matrix-vector products As is often required in many applications. Unlike MapReduce, where you have to load a continuously of iteration is loaded once, since all A is, is just a vertex and it's, connections to neighbors, that's all. So the caveat of course is that if the number of edges is so huge that, even using all the memory of all the processors that you have, you can't load it, then you can't do this. So the caveat for Pregel, this is the Pregel model actually, we will come to that formally in a minute, the caveat is that you need to have enough memory to load the entire graph in order for this to work.