Finally, let's take a peek inside, the real implementations of map-reduce that is, not the light-weight octo simulation, but platforms such as The Duke. The input files, as we shall see in next week's lecture, are picked up from a distributed file system in chunks by worker nodes which are essentially executing the map phase. They read chunks of data, perform map operations on them and write the results to the local disk. They sort the data before writing it and they write it in chunks, different chunks for different reduce keys. It's very important to note that whereas the input files are in a distributed file system, the local rights after each mapper is done with its work, are on their local disks. As a result there's no inter processor communication involved until these are actually read by the different producers. When a mapper is finished with a particular reduce key it informs a master node that it's actually done with that key. That master note keeps getting such messages from many different mappers task. And assigns reduce keys to difference reduce processors depending on which are available. Some times mapper themselves has finished their work and can be reassigned as reducers. In other cases there are mappers and reducers working parallel. When a reducer is informed by the master node that a particular reduce key is ready, it's also told which mapper to pick it up from. The reducer then issues a remote read to that mapper process, which has a listener sitting on it, waiting for such remote read requests, which it services. That is how the reducers actually grab data from different mappers, as soon as they know where to get that data from. Notice that even though a master can tell a reducer when a mapper has finished working on a particular reduce key. The master also knows how many mappers there are and can figure out when all the work that needs to be done in particular reduce key is actually over and can inform the reducer assigned to that key that this is the case so that it actually begins performing reduce operation on a particular reduce key. Notice that a worker, rather a reduce worker can pick up data from multiple mappers, but can't really start working on a reduce key itself, that is, doing the reduce function until it knows that it has all the data required for that key otherwise the results might be incorrect. Because reduce functions don't actually have to be computative and associative as we have seen earlier. Last but not least, the most important part of map produced implementations that scale is their fault tolerance. This was the reason why traditional data based technologies could not scale to thousands or hundreds of thousands of processors. The trouble is that processors are bound to fail in long tasks involving large numbers of machines. Simple probability insures that is almost always likely to happen. For example if the chance that a processor will never fail in a particular time period is 99%. Then if you have say, 100 such machines, the chance that none of them will fail is somewhere around only 30%. That if you have 1,000 machines, the chance that at least one will fail, is almost 100%. You can work this out using very similar math to that we used in the locality sensitive hashing example. So map produced computation that can take hours or even days on large volumes of big data. Need to be. Aware of the fact, and in fact, certain that at least some of their, machines will fail during the computation, but the computation itself should continue and complete successfully. Here is how map-reduce ensures this. The master node maintains regular communication with each of the worker nodes, whether they are mappers or reducers. It also keeps track of which key ranges have actually been successfully processed by each mapper and reducer. Now if a mapper fails, then the master has to simply reassign the key ranges assigned to that mapper to some other processor whenever it gets one available. Since the data had to be read from the input in any case, the new reassigned mapper simply rereads the data and reprocesses it. Similarly if a reducer fails, the key ranges for whatever task it has not yet finished, are assigned to some newly appointed reducer. However, for the tasks we've just already completed before failing. We don't have to worry because they have already been written out to the output. This is how the master keeps track of how much of the MapReduce task is completed. And when a failure is detected, it merely assigns those key ranges to other workers for the appropriate map or reduce phase. In addition, the master also figures out whether there are situations where some mappers are performing very slowly for some unforeseen reason. It treats those as failure, and simply reassigns their work to somebody else. Whoever finishes first, the data is taken as done. Since it doesn't really matter if something is computed twice by mistake. Now you might think that this master which is a single node is in obvious point of failure and that is indeed the case. If the master fails, the map produced task indeed does fail. However, Map Produce works on large numbers and probabilities. The chance of one amongst a 1000 processor machines failing is almost 100 percent as we argued a while back. Even though the chance that any one of them fails might be, less than a percent or even.1%, by that argument the chance that this particular master fails might be a tenth of a percent or even less, but the chance that any of the other processors fails if there are large numbers of them is almost one. This still doesn't remove the fact that the master is the single point of failure. But the probability of the map produce task failing is still very, very small. So let's see what we have learnt this week. We began with the review of parallel computing ideas, the concept of speed-up, parallel efficiency and scalable algorithm. We then went into the map-reduce programming paradigm as a higher level attraction as compared to hand coded parallel programming using shared memory and message passing. We did a few examples. Discussed how the ocdo simulator works. Showed how map-reduce could be applied to various machine learning and search tasks that we have studied in the past weeks. And concluded with some discussion of the parallel efficiency and internals of map-reduce. Next week, we will turn our attention to the big data technology that is required to make MapReduce actually work. In particular, distributed file systems distributed no sequel databases. And how they differ from the traditional relational variety And finally, emerging trends as to where this technology is actually going in the future.