1
00:00:00,000 --> 00:00:07,021
You'll recall we began our discussion of
parallel computing by the notions of

2
00:00:07,021 --> 00:00:13,004
parallel efficiency.
So it's worth trying to figure out what is

3
00:00:13,004 --> 00:00:19,079
the parallel efficiency of MapReduce.
Let's assume that the data which is

4
00:00:19,079 --> 00:00:23,095
emitted by all the mappers put together is
D.

5
00:00:23,095 --> 00:00:28,085
This might be much less than the data
emitted by the.

6
00:00:30,000 --> 00:00:36,017
If we use combiners efficiently, D may
actually be significantly less, than the

7
00:00:36,017 --> 00:00:40,046
total amount of data input to the
MapReduce program.

8
00:00:40,088 --> 00:00:46,090
We also assume that their will be
processors that will include mappers and

9
00:00:46,090 --> 00:00:50,715
reducers.
And will assume there is really no overlap

10
00:00:50,715 --> 00:00:57,021
between mappers and reducers so that we
have P by two mappers and P by two

11
00:00:57,021 --> 00:01:00,410
reducers.
Clearly parallel efficiency depends on

12
00:01:00,410 --> 00:01:04,296
overhead such as communications between
processors.

13
00:01:04,296 --> 00:01:10,605
And so the ideal way to compute the
minimum parallel efficiency is to assume

14
00:01:10,605 --> 00:01:14,466
that there is no real useful work being
done.

15
00:01:14,466 --> 00:01:21,323
Now we will actually assume that there is
no real useful work being performed as in

16
00:01:21,323 --> 00:01:26,409
terms of an algorithm.
Instead, we are only considering the

17
00:01:26,409 --> 00:01:34,209
efficiency of the map-produced platform
itself, where all the work that it really

18
00:01:34,209 --> 00:01:41,658
does, is, sort, p by two of these blocks,
each of size, d divided by p by two, and

19
00:01:41,658 --> 00:01:47,568
then communicate the results, to, the
different reducers.

20
00:01:47,568 --> 00:01:56,476
Now, the actual efficiency of any
algorithm using MapReduce will be at least

21
00:01:56,476 --> 00:02:03,900
this much, provided that algorithm
requires this sorting process for its

22
00:02:03,900 --> 00:02:08,548
completion.
That need not always be the case though,

23
00:02:08,548 --> 00:02:17,457
but for the moment we assume that it is.
Now let's consider the communication

24
00:02:17,457 --> 00:02:22,429
costs.
That is, 2D by P of data, in each block

25
00:02:22,429 --> 00:02:30,702
here, each mapper, has to be communicated
to, to P by two reducers.

26
00:02:30,702 --> 00:02:41,067
Each reducer gets one by one p by 2th of
the data so each reducer gets about 4d by

27
00:02:41,067 --> 00:02:49,082
p squared data and this has, has to go, go
to p by two reducers in particular it has

28
00:02:49,082 --> 00:02:54,632
to be collected from p by two mappers at
each reducer.

29
00:02:54,632 --> 00:02:58,730
So.
This requires P by two separate steps of

30
00:02:58,730 --> 00:03:03,702
communication.
So you multiply this by P by two to get 2D

31
00:03:03,702 --> 00:03:07,548
by P as representing the communicating
costs.

32
00:03:07,548 --> 00:03:14,655
The serial costs are of course.
The cost required to, to sort P by two

33
00:03:14,655 --> 00:03:21,095
blocks of this size.
Which works to something, which is

34
00:03:21,095 --> 00:03:29,665
constant multiplier times D by P.
Log D by P times P.

35
00:03:29,665 --> 00:03:35,638
And so we get.
Of, a term like this.

36
00:03:35,638 --> 00:03:40,887
The same term in the denominator, plus the
communication costs.

37
00:03:40,887 --> 00:03:46,927
This, of course, is the formula for
efficiency, where you have the serial,

38
00:03:46,927 --> 00:03:53,421
time, in the numerator, the parallel time,
in the denominator which is, the serial

39
00:03:53,421 --> 00:04:00,626
time divided by p in this case.
And multiplied by P and the communication

40
00:04:00,626 --> 00:04:04,558
cost.
Simplifying this, you get an expression

41
00:04:04,558 --> 00:04:11,563
that looks something like one over one
plus, two times some communication cost

42
00:04:11,563 --> 00:04:16,787
and constant divide by a work constant
times log of D by P.

43
00:04:16,787 --> 00:04:23,669
Notice that we have liberally removed,
constants like the two which might have,

44
00:04:23,669 --> 00:04:29,753
might occur in the, the log over here
because that only, adds a, constant,

45
00:04:29,753 --> 00:04:34,307
additive factor.
We've also, been fairly liberal about,

46
00:04:34,307 --> 00:04:41,771
factors of two here and there, just to get
the form of this equation, correctly.

47
00:04:41,771 --> 00:04:47,723
Remember what we said about an algorithm
being scalable.

48
00:04:47,723 --> 00:04:56,077
If efficiency approaches one as P by P
grows than the algorithm is set to be

49
00:04:56,077 --> 00:05:00,431
scalable.
As d/p becomes very large this term

50
00:05:00,431 --> 00:05:07,100
becomes extremely small and efficiency
does indeed become close to one.

51
00:05:07,100 --> 00:05:14,950
So the platform for maproduce is actually
scalable in this sense The caveat is of

52
00:05:14,950 --> 00:05:22,658
course that the algorithm, that we perform
on this platform actually requires this

53
00:05:22,658 --> 00:05:29,851
serial work to be performed, in other
words, there is a need for sorting in the

54
00:05:29,851 --> 00:05:34,828
algorithm, of some kind.
This is not always the case of course.

55
00:05:34,828 --> 00:05:41,161
For example there are many tasks which can
just as well performed by only a map phase

56
00:05:41,161 --> 00:05:47,043
not requiring any reduce or any sorting.
Such task may be called embarrassingly

57
00:05:47,043 --> 00:05:53,394
parallel in which case simply using the
map reduce platform will add overhead,

58
00:05:53,394 --> 00:05:55,047
which is avoidable.