You'll recall we began our discussion of parallel computing by the notions of parallel efficiency. So it's worth trying to figure out what is the parallel efficiency of MapReduce. Let's assume that the data which is emitted by all the mappers put together is D. This might be much less than the data emitted by the. If we use combiners efficiently, D may actually be significantly less, than the total amount of data input to the MapReduce program. We also assume that their will be processors that will include mappers and reducers. And will assume there is really no overlap between mappers and reducers so that we have P by two mappers and P by two reducers. Clearly parallel efficiency depends on overhead such as communications between processors. And so the ideal way to compute the minimum parallel efficiency is to assume that there is no real useful work being done. Now we will actually assume that there is no real useful work being performed as in terms of an algorithm. Instead, we are only considering the efficiency of the map-produced platform itself, where all the work that it really does, is, sort, p by two of these blocks, each of size, d divided by p by two, and then communicate the results, to, the different reducers. Now, the actual efficiency of any algorithm using MapReduce will be at least this much, provided that algorithm requires this sorting process for its completion. That need not always be the case though, but for the moment we assume that it is. Now let's consider the communication costs. That is, 2D by P of data, in each block here, each mapper, has to be communicated to, to P by two reducers. Each reducer gets one by one p by 2th of the data so each reducer gets about 4d by p squared data and this has, has to go, go to p by two reducers in particular it has to be collected from p by two mappers at each reducer. So. This requires P by two separate steps of communication. So you multiply this by P by two to get 2D by P as representing the communicating costs. The serial costs are of course. The cost required to, to sort P by two blocks of this size. Which works to something, which is constant multiplier times D by P. Log D by P times P. And so we get. Of, a term like this. The same term in the denominator, plus the communication costs. This, of course, is the formula for efficiency, where you have the serial, time, in the numerator, the parallel time, in the denominator which is, the serial time divided by p in this case. And multiplied by P and the communication cost. Simplifying this, you get an expression that looks something like one over one plus, two times some communication cost and constant divide by a work constant times log of D by P. Notice that we have liberally removed, constants like the two which might have, might occur in the, the log over here because that only, adds a, constant, additive factor. We've also, been fairly liberal about, factors of two here and there, just to get the form of this equation, correctly. Remember what we said about an algorithm being scalable. If efficiency approaches one as P by P grows than the algorithm is set to be scalable. As d/p becomes very large this term becomes extremely small and efficiency does indeed become close to one. So the platform for maproduce is actually scalable in this sense The caveat is of course that the algorithm, that we perform on this platform actually requires this serial work to be performed, in other words, there is a need for sorting in the algorithm, of some kind. This is not always the case of course. For example there are many tasks which can just as well performed by only a map phase not requiring any reduce or any sorting. Such task may be called embarrassingly parallel in which case simply using the map reduce platform will add overhead, which is avoidable.