As we all know, microprocessors are becoming cheaper and faster every day. Computing power doubles every eighteen for the same price. And this has been going on for more than 40 years. So, in the early days of parallel computing, such as the 80s. By the time one figured how one could use many processors to do the work of one processor faster, that one processor itself became twice as fast. So parallel computing really had to play a catch up game. However, in recent years you may have noticed that all your PCs and servers come with multiple cores, four cores, eight cores, sixteen cores, and, soon will be 32 cores or 64 cores on every chip. What that's really meaning is that the limits of the speed of light are requiring parallel computing to be used in everyday life, within every computer which was never the case twenty or 30 years ago. On the web, parallel computing is used at a macro level with hundreds of thousands or even millions of servers powering the large search engines and social media platforms, such as Google and Facebook. So, in two different ways, within chips on the one hand and in the web on the other, parallel computing, has seen, a new life, in the past five years. So it's important to understand some basic principles, when one tries to do things faster using more machines. If one uses P machines, one would, expect that one can do the same task which took t1 time, in less time. Let's call it Tp. This will obviously be greater than one and will increase as you increase the number of processors. So for example, you have twenty processors, you might get speed up of somewhere around fifteen or sixteen. If you have 40 processors, you might get somewhere around 29 or 30. But as the number of processors increases, for a variety of reasons, the speed up doesn't go on forever. So another important concept is, the efficiency. The expected speed up is P, with P processors. So the efficiency measures how close we are to achieving that speed up, and that's something less than one. When you have a small number of processors like two or four or eight you can get close to one as your efficiency. But as the number of processors goes up, we will see that the efficiency will actually come down. There are reasons for this and the primary reason is communication between processors or coordination between processors. Think of it like this, if you have two or three people to perform a task, it's probably easier but if you put a 100 people to perform a task, the task doesn't necessarily become 100 times faster because the people will have to talk to each other and coordinate their actions. So, do we lose hope? Is there no way we can really exploit thousands and millions of processors? Well, there is actually. And the trick is to not solve the same problem, but to solve bigger problems if you have more processors. There's no point indexing a file of a megabyte with a million machines, you only need a million machines if you have billions and billions of web pages to index. A scalable algorithm is one which allows you to increase the problem size without suffering any loss in efficiency. On the contrary, because you have large number of processors, and a large problem size, the effect of coordination costs should actually go down as a ratio of the overall work being performed. So your efficiency should actually increase as the ratio and over p increases So scalar algorithm is one where, as the problem size goes up in proportion to the number of processors that you throw at the problem, you get better and better efficiency. Parallel programming is considerably more difficult than normal serial programming. Because one has to deal with coordinating the actions of many different processors. There are two fundamental paradigms that have been used historically for parallel programming one is shared memory. For example, one might partition work using shared memory in this way. Each processor does a particular piece of work called W sub P. And different processors do different work by locking that part of the data which they are working on doing their work. And then unlocking the data or piece of data that they just worked on. And as long as every processor accesses the different part of the data that some other processor is accessing, this is reasonably fast. Of course, the problem arises if they try to access the same piece of data in which case, they might find it to be locked by somebody else. And that's where the costs of coordination or communication overhead slow down the procedure and reduce the efficiency. Another paradigm which has been used historically is message passing, for example, by partitioning the data. So each processor does the same computational work but on different parts of the data. So for example, a processor might have this slice of the data. So the data is divided into n over p slices and each processor gets one slice but performs the same work on this slice of data. Of course, if that was all there was to it, the problem is what is called embarrassingly parallel and it becomes fairly trivial. However, in most cases, the problem is a bit more difficult because the results of work done on one slice of data need to use the results from the work done on another slice of data. Think of adding up n numbers, you simply couldn't add up each small slice and expect to get the answer, but you'd have to exchange, the results of these partial sums, with at least one processor, or maybe, many of them in a tree structure, so that you could finally compute, the sum of all the numbers, in parallel. Therefore, in the message-passing model, processors need to exchange data after performing some work on their slice. So that they can perform more work based on the results of the work done by other processors on other slices. This make message passing a bit more difficult to programme than shared memory, but it's also much more scalable because things like shared memory itself are really difficult to implement when you have thousands or millions of processors. Of course, you can have shared memory and also partition the data. Just as you could have message passing, and partition the work. The map produce programming paradigm is a, higher level, abstraction then, pure message passing or shared memory. But it is a message passing and data parallel paradigm.