[MUSIC] So, the MapReduce abstraction has two different steps, one is the map step and the other one is the reduce step, thus the name MapReduce. Well in the map step, we're going to assume that the operation we want to perform is data-parallel over elements like our documents. And the result of the map call is going to be to generate a key value pair. So, for example, in our word count example, we're going to take in a document. And then for every word in that document, we're just going to emit the pair (word, 1). So we'd cycle through our document. If we see the word, uw, we emit (uw, 1). We see the word, machine, we put (machine, 1). Then we happen to see uw again, so we put (uw, 1). And then we see learning, so we emit (learning, 1), and so on. So in this example the key are the different words in our vocabulary, uw, machine, learning, and so on, and the value is just the count of that word. And for this specific implementation, we're emitting just a count of 1 every time we see a word instance. But when we're talking about the MapReduce abstraction, the value can be any data type. Then for the reduce step here, this step is going to be one that aggregates over the values associated with every key. And this must be a commutative, associative operation. So commutative means that the order of the operations doesn't matter. So for example, a+ b = b + a. So addition is commutative, and it's also associative which means that the grouping of operations doesn't matter. So again, addition satisfies this, where a + b, if we do that first and then add c, that's the same as if we had done a + the operation of b + c. Okay, and for a reduce step, we need this operation to be one that's data-parallel over keys. So, let me emphasis what the object is data-parallel over. In the map step, it's over the elements like documents in our corpus. But here once we produce these key value pairs, we take all the keys, and the operation has to be able to be performed independently for each key. And so just to make this a little bit more explicit, in the map step we're saying that the word count in a given document doesn't depend on any other document. In the reduce step we're saying, the total count of a given word doesn't depend on the total count of any other word. So let's go through the reduce example for word counting. Here what we're going to do is we're just going to initialize our counts of a given word to be 0. So the reducer takes in the specific key as well as a list of values. So maybe I'll write this here, so this is the key, and this is a list of values. And what we're going to do is for every value in this list, we're going to increment our count by that value. So in this case, it's a count. It doesn't really always have to be a count. It could just be sums of values, and then we're going to emit this key value pair. So here's our key value pair, and in this example this was our key value pair that was omitted. So in particular if we go along with this example here, if we were to, Call reduce on the key uw. And associated with that key we had counts, let's say 1, 17, 0, 0, 12, 2, then what we would emit would be (uw, 32). So MapReduce has a really long history in functional programming, and somewhat more recently it was popularized by Google. And then really it gained a lot of broad adoption through an open-source implementation called Hadoop that was made by Yahoo. [MUSIC]