As we all know, microprocessors are
becoming cheaper and faster every day.
Computing power doubles every eighteen for
the same price.
And this has been going on for more than
40 years.
So, in the early days of parallel
computing, such as the 80s.
By the time one figured how one could use
many processors to do the work of one
processor faster, that one processor
itself became twice as fast.
So parallel computing really had to play a
catch up game.
However, in recent years you may have
noticed that all your PCs and servers come
with multiple cores, four cores, eight
cores, sixteen cores, and, soon will be 32
cores or 64 cores on every chip.
What that's really meaning is that the
limits of the speed of light are requiring
parallel computing to be used in everyday
life, within every computer which was
never the case twenty or 30 years ago.
On the web, parallel computing is used at
a macro level with hundreds of thousands
or even millions of servers powering the
large search engines and social media
platforms, such as Google and Facebook.
So, in two different ways, within chips on
the one hand and in the web on the other,
parallel computing, has seen, a new life,
in the past five years.
So it's important to understand some basic
principles, when one tries to do things
faster using more machines.
If one uses P machines, one would, expect
that one can do the same task which took
t1 time, in less time.
Let's call it Tp.
This will obviously be greater than one
and will increase as you increase the
number of processors.
So for example, you have twenty
processors, you might get speed up of
somewhere around fifteen or sixteen.
If you have 40 processors, you might get
somewhere around 29 or 30.
But as the number of processors increases,
for a variety of reasons, the speed up
doesn't go on forever.
So another important concept is, the
efficiency.
The expected speed up is P, with P
processors.
So the efficiency measures how close we
are to achieving that speed up, and that's
something less than one.
When you have a small number of processors
like two or four or eight you can get
close to one as your efficiency.
But as the number of processors goes up,
we will see that the efficiency will
actually come down.
There are reasons for this and the primary
reason is communication between processors
or coordination between processors.
Think of it like this, if you have two or
three people to perform a task, it's
probably easier but if you put a 100
people to perform a task, the task doesn't
necessarily become 100 times faster
because the people will have to talk to
each other and coordinate their actions.
So, do we lose hope?
Is there no way we can really exploit
thousands and millions of processors?
Well, there is actually.
And the trick is to not solve the same
problem, but to solve bigger problems if
you have more processors.
There's no point indexing a file of a
megabyte with a million machines, you only
need a million machines if you have
billions and billions of web pages to
index.
A scalable algorithm is one which allows
you to increase the problem size without
suffering any loss in efficiency.
On the contrary, because you have large
number of processors, and a large problem
size, the effect of coordination costs
should actually go down as a ratio of the
overall work being performed.
So your efficiency should actually
increase as the ratio and over p increases
So scalar algorithm is one where, as the
problem size goes up in proportion to the
number of processors that you throw at the
problem, you get better and better
efficiency.
Parallel programming is considerably more
difficult than normal serial programming.
Because one has to deal with coordinating
the actions of many different processors.
There are two fundamental paradigms that
have been used historically for parallel
programming one is shared memory.
For example, one might partition work
using shared memory in this way.
Each processor does a particular piece of
work called W sub P.
And different processors do different work
by locking that part of the data which
they are working on doing their work.
And then unlocking the data or piece of
data that they just worked on.
And as long as every processor accesses
the different part of the data that some
other processor is accessing, this is
reasonably fast.
Of course, the problem arises if they try
to access the same piece of data in which
case, they might find it to be locked by
somebody else.
And that's where the costs of coordination
or communication overhead slow down the
procedure and reduce the efficiency.
Another paradigm which has been used
historically is message passing, for
example, by partitioning the data.
So each processor does the same
computational work but on different parts
of the data.
So for example, a processor might have
this slice of the data.
So the data is divided into n over p
slices and each processor gets one slice
but performs the same work on this slice
of data.
Of course, if that was all there was to
it, the problem is what is called
embarrassingly parallel and it becomes
fairly trivial.
However, in most cases, the problem is a
bit more difficult because the results of
work done on one slice of data need to use
the results from the work done on another
slice of data.
Think of adding up n numbers, you simply
couldn't add up each small slice and
expect to get the answer, but you'd have
to exchange, the results of these partial
sums, with at least one processor, or
maybe, many of them in a tree structure,
so that you could finally compute, the sum
of all the numbers, in parallel.
Therefore, in the message-passing model,
processors need to exchange data after
performing some work on their slice.
So that they can perform more work based
on the results of the work done by other
processors on other slices.
This make message passing a bit more
difficult to programme than shared memory,
but it's also much more scalable because
things like shared memory itself are
really difficult to implement when you
have thousands or millions of processors.
Of course, you can have shared memory and
also partition the data.
Just as you could have message passing,
and partition the work.
The map produce programming paradigm is a,
higher level, abstraction then, pure
message passing or shared memory.
But it is a message passing and data
parallel paradigm.