Finally, let's take a peek inside, the
real implementations of map-reduce that
is, not the light-weight octo simulation,
but platforms such as The Duke.
The input files, as we shall see in next
week's lecture, are picked up from a
distributed file system in chunks by
worker nodes which are essentially
executing the map phase.
They read chunks of data, perform map
operations on them and write the results
to the local disk.
They sort the data before writing it and
they write it in chunks, different chunks
for different reduce keys.
It's very important to note that whereas
the input files are in a distributed file
system, the local rights after each mapper
is done with its work, are on their local
disks.
As a result there's no inter processor
communication involved until these are
actually read by the different producers.
When a mapper is finished with a
particular reduce key it informs a master
node that it's actually done with that
key.
That master note keeps getting such
messages from many different mappers task.
And assigns reduce keys to difference
reduce processors depending on which are
available.
Some times mapper themselves has finished
their work and can be reassigned as
reducers.
In other cases there are mappers and
reducers working parallel.
When a reducer is informed by the master
node that a particular reduce key is
ready, it's also told which mapper to pick
it up from.
The reducer then issues a remote read to
that mapper process, which has a listener
sitting on it, waiting for such remote
read requests, which it services.
That is how the reducers actually grab
data from different mappers, as soon as
they know where to get that data from.
Notice that even though a master can tell
a reducer when a mapper has finished
working on a particular reduce key.
The master also knows how many mappers
there are and can figure out when all the
work that needs to be done in particular
reduce key is actually over and can inform
the reducer assigned to that key that this
is the case so that it actually begins
performing reduce operation on a
particular reduce key.
Notice that a worker, rather a reduce
worker can pick up data from multiple
mappers, but can't really start working on
a reduce key itself, that is, doing the
reduce function until it knows that it has
all the data required for that key
otherwise the results might be incorrect.
Because reduce functions don't actually
have to be computative and associative as
we have seen earlier.
Last but not least, the most important
part of map produced implementations that
scale is their fault tolerance.
This was the reason why traditional data
based technologies could not scale to
thousands or hundreds of thousands of
processors.
The trouble is that processors are bound
to fail in long tasks involving large
numbers of machines.
Simple probability insures that is almost
always likely to happen.
For example if the chance that a processor
will never fail in a particular time
period is 99%.
Then if you have say, 100 such machines,
the chance that none of them will fail is
somewhere around only 30%.
That if you have 1,000 machines, the
chance that at least one will fail, is
almost 100%.
You can work this out using very similar
math to that we used in the locality
sensitive hashing example.
So map produced computation that can take
hours or even days on large volumes of big
data.
Need to be.
Aware of the fact, and in fact, certain
that at least some of their, machines will
fail during the computation, but the
computation itself should continue and
complete successfully.
Here is how map-reduce ensures this.
The master node maintains regular
communication with each of the worker
nodes, whether they are mappers or
reducers.
It also keeps track of which key ranges
have actually been successfully processed
by each mapper and reducer.
Now if a mapper fails, then the master has
to simply reassign the key ranges assigned
to that mapper to some other processor
whenever it gets one available.
Since the data had to be read from the
input in any case, the new reassigned
mapper simply rereads the data and
reprocesses it.
Similarly if a reducer fails, the key
ranges for whatever task it has not yet
finished, are assigned to some newly
appointed reducer.
However, for the tasks we've just already
completed before failing.
We don't have to worry because they have
already been written out to the output.
This is how the master keeps track of how
much of the MapReduce task is completed.
And when a failure is detected, it merely
assigns those key ranges to other workers
for the appropriate map or reduce phase.
In addition, the master also figures out
whether there are situations where some
mappers are performing very slowly for
some unforeseen reason.
It treats those as failure, and simply
reassigns their work to somebody else.
Whoever finishes first, the data is taken
as done.
Since it doesn't really matter if
something is computed twice by mistake.
Now you might think that this master which
is a single node is in obvious point of
failure and that is indeed the case.
If the master fails, the map produced task
indeed does fail.
However, Map Produce works on large
numbers and probabilities.
The chance of one amongst a 1000 processor
machines failing is almost 100 percent as
we argued a while back.
Even though the chance that any one of
them fails might be, less than a percent
or even.1%, by that argument the chance
that this particular master fails might be
a tenth of a percent or even less, but the
chance that any of the other processors
fails if there are large numbers of them
is almost one.
This still doesn't remove the fact that
the master is the single point of failure.
But the probability of the map produce
task failing is still very, very small.
So let's see what we have learnt this
week.
We began with the review of parallel
computing ideas, the concept of speed-up,
parallel efficiency and scalable
algorithm.
We then went into the map-reduce
programming paradigm as a higher level
attraction as compared to hand coded
parallel programming using shared memory
and message passing.
We did a few examples.
Discussed how the ocdo simulator works.
Showed how map-reduce could be applied to
various machine learning and search tasks
that we have studied in the past weeks.
And concluded with some discussion of the
parallel efficiency and internals of
map-reduce.
Next week, we will turn our attention to
the big data technology that is required
to make MapReduce actually work.
In particular, distributed file systems
distributed no sequel databases.
And how they differ from the traditional
relational variety And finally, emerging
trends as to where this technology is
actually going in the future.