Let's now turn to the map-reduce paradigm itself. As we just mentioned, it's a tailor parallel, paradigm, with message passing. It's also a pipelined procedure, where there are two phases. There's the map phase and the reduce phase. Further, as we also mentioned, it's a higher level abstraction. So programmers only need to specify what each mapper actually does, and what each reducer actually does And the map-reduce implementation takes care of all the message passing required between the two phases. In fact, one doesn't even need to worry about how many processors there are, and which one should be doing what kind of job map-reduce. That's all left to the map-reduce implementation. All the programmer needs to do is define their task in terms of the map-reduce paradigm which will come to in a moment. Point the system at the data on which they would like this task performed. And the map-reduce implementation, figures out which processors can be made available to this task. And does the rest without the programmer having to worry about the details. That's what makes map-reduce exceedingly powerful for large scale data parallel processing. Since it bridges the gap between high level specification of an algorithm, and its massively parallel implementation in a manner which has really never been successfully done ever before in the history of parallel computing. As we shall see, many complex web intelligence algorithms for machine learning and processing big data are easy to express using the mapreduce paradigm. Making their large scale parallel implementation much easier than if one had to program these using either shared memory or low level message passing as was the case before map-reduce. The most common example used to, explain map-reduce is that of counting words and documents. Something we've seen last week when we tried to compute inverse document frequencies for example. Suppose we are given many documents in this case. Say there are only ten documents here, but there could be you know, hundreds or thousands or even millions of documents. And we need to compute the frequency that each word takes across this whole set of documents. And we like to do it in parallel, using many processors, so as to reduce the overall execution time. In our simple example we will take only ten documents. And three mapper processors which execute map tasks. Followed by two reducer processors that execute reduced tasks. Of course, once the mapper processors are done they can start doing the reduced work. So the total number of, processors need not be five, it might not even be as low as three, or maybe four. The mappers are given data, in this case there are documents identified by document IDs, which the map-reduce system randomly distributes across the different mapper processors. Each mapper processor thus, gets a bunch of documents. It doesn't really know which documents it's getting. It just gets a bunch of them in any random order. Now, here's how the problem is defined in terms of map and reduced tasks. Remember, a mapper simply needs to perform a whole bunch of map tasks on all the documents that it gets. So its getting a sequence of document IDs followed by the contents of the document. In other words its getting a key and a value in a sequence as such key value pairs. A map task works on a key value pairs such as a document ID and its contents and produces a bunch of key value pairs in this case words and the frequencies with which they occur in this particular document. Remember that a map task acts on one key value pair. So, it needs to act in this case on one document. It is, however, free to emit many key value pairs as its output. In this case the many words that occur in that document along with their frequencies. For example, the map tasks for the document D1 would be to emit a W1 followed by one. A W2 followed by one and W4 followed by a one and similarly for the other documents. In some cases such as the six, we would emit a W2 followed by a two. But in most other cases it would be a count of one. The second part of our problem definition is the reduce phase, in which a list of key value pairs corresponding to the same W key, in this case is reduced or, compressed, or combined, into one key value pair, for that particular key. So for example, all the counts for a particular word are summed up in the reduced phase to create one key value pair for each word. Now clearly in order to perform the reduce function, all the key value pairs for particular word need to be together if they are to be so summed up. Unfortunately, since each mapper acted on an independent set of documents, it only has access to those word count pairs for the particular document it happens to see. The best that a mapper can do is compute the reduced function using the pairs that it happens to have available. So, for example, this mapper, it only had these two documents D1 and D2, which had a W1, so it computes a W1 comma two by performing the map and reduced functions on the set of documents that it has available to it. Similarly, this mapper adds up the sums for W2 across all its documents to get a value of W2, four. Of course that is not enough, all these partial summations themselves need to be summed up and this is where the map produce implementation comes in. And this where the map-reduce implementation comes in and sends all of the key value pairs with the same key to the same producer so that they are all together and they can finally be reduced to get the right answer. So, all the pairs having W1 managed to get to this reducer, where they can get added up to get the correct value for W1. So, we got W1 from here, we got a W1 from here, we got a W1 from here. And we added all of these up to get a value of seven. Similarly the second reducer also manages to get all the key value pairs for each word to be together, so that they could be added up. In this case the first reducer works on pairs, where the keys are W1 and W2. And the second reducer works on those, where it's W3 and W4. So now we're ready to define map-reduce formally. Mappers read data in the form of key value pairs K1, B1 and for each such K1, B1 pair, a mapper may emit one or more pairs of the form of K2, V2. In our word counting example, K1 was document ID. And B1 was the contents of the document, whereas K2 was the word and V2 the number of times a word occurred in a particular document. The reduced function on the other hand receives all the pairs for sum value of K2 and combines them through a reduce function. So the reduce of K2 and all the values that came out of the map phase for that particular value of K2 are combined using a function F sub R. And now we're counting example F sub R is just a summation across all the occurrences of a particular word in all the documents that we process. Some of you might have noticed that in our example we actually did, some reduced work within the mappers. But we'll come to that point in a short while. For the moment. The strict definition of a map function is only a map that is from one key val-, value pair produce one or more key value pairs. And reduce function is take all the key value pairs for a particular value of the key and apply the reduce function on them. The map-reduce platform is responsible for routing pairs to reducers, so that all the pairs for the same key end up at the same reducer. Mapreduce essentially sorts the key value pairs that come out of the map phase. So that they land up in the right place. Finally, a very important point to note is that the map reduce process reads data, possibly from a file system or a database, and writes fresh data. So, it's a batch process that always produces more data. Very different from a database, where one is querying data and just getting an