1
00:00:00,170 --> 00:00:01,650
So let's see what happens when you

2
00:00:01,650 --> 00:00:06,810
try to iterate MapReduce based matrix
vector multiplication.

3
00:00:06,810 --> 00:00:08,520
First time of course you have to read the

4
00:00:08,520 --> 00:00:12,520
matrix A, and read the vector X in
parallel

5
00:00:12,520 --> 00:00:15,620
by all the mappers, then the intermediate
state, rather

6
00:00:15,620 --> 00:00:21,020
after the first MapReduce phase, you'll
write the Y prime.

7
00:00:21,020 --> 00:00:25,580
And then you have to read the Y primes,
and then at the end you will write the

8
00:00:25,580 --> 00:00:29,588
Y's all in parallel, because every reducer
computes a few rows of the the final Y.

9
00:00:29,588 --> 00:00:34,430
But now, you have to once again you have
read the Matrix A

10
00:00:35,440 --> 00:00:39,830
and the Y's that you have written, in the
previous iteration.

11
00:00:41,410 --> 00:00:46,040
Write Y prime, write Y again and
continuously read A again and read Y.

12
00:00:46,040 --> 00:00:50,100
So, there're two places where you're
reading things again and again.

13
00:00:50,100 --> 00:00:51,000
One is here

14
00:00:51,000 --> 00:00:56,150
your writing Y prime and, and reading Y
prime again which you can potentially

15
00:00:56,150 --> 00:01:02,535
avoid by pipelining using technique called
like HaLoop and other implementations.

16
00:01:02,535 --> 00:01:06,260
Where you can pipeline the writes and the
reads from the, writes of

17
00:01:06,260 --> 00:01:10,482
one reducer to the reads of the next
mapper, in the next phase.

18
00:01:10,482 --> 00:01:14,200
But the, we still have a problem having to
read A, again.

19
00:01:16,500 --> 00:01:19,060
Notice MapReduce does not have any memory.

20
00:01:20,330 --> 00:01:23,740
It, the basic assumption is that the data
is just too large, it will

21
00:01:23,740 --> 00:01:26,120
never fit in memory, and so you

22
00:01:26,120 --> 00:01:28,230
essentially have to read A again and
again.

23
00:01:30,180 --> 00:01:35,860
But suppose it did fit in memory.
MapReduce still has no memory,

24
00:01:37,450 --> 00:01:42,090
so it can't really remember A between
iterations the mappers

25
00:01:42,090 --> 00:01:43,650
don't know what they're going to get next
time.

26
00:01:43,650 --> 00:01:46,270
It's a random order of reads every time.

27
00:01:46,270 --> 00:01:50,480
And so MapReduce iterations are inherently
inefficient in this regard.

28
00:01:50,480 --> 00:01:54,570
That if A is really a huge matrix, lots
and lots of edges, or rather lots

29
00:01:54,570 --> 00:01:58,510
and lots of entries, then it's, it's just
going to have to read it again and again.

30
00:02:00,580 --> 00:02:03,780
So this is definitely a problem, and there
is

31
00:02:03,780 --> 00:02:06,095
a way around it, which is essentially
leads us to

32
00:02:06,095 --> 00:02:06,260
[UNKNOWN].

33
00:02:06,260 --> 00:02:06,760
Let's

34
00:02:09,780 --> 00:02:14,900
look at the same matrix that your
operation now as a graph operation.

35
00:02:14,900 --> 00:02:20,200
And what do I mean by that?
Think of this matrix as defining

36
00:02:22,610 --> 00:02:26,480
edges in a graph, so for example, the fact
you have,

37
00:02:29,510 --> 00:02:35,830
the, the first node, which is the first
column, or, has

38
00:02:35,830 --> 00:02:40,250
an edge which leaves it and goes to the
second

39
00:02:43,010 --> 00:02:43,510
vertex.

40
00:02:44,530 --> 00:02:50,430
And similarly there's a edge going from
the second vertex to the first vertex.

41
00:02:50,430 --> 00:02:57,370
So, read each column as the edges that go
out of, say the first vertex,

42
00:02:57,370 --> 00:03:02,120
and read each row as the edges that come
in to the first vertex for example.

43
00:03:02,120 --> 00:03:06,530
So, here you have minus one is just about
X itself.

44
00:03:06,530 --> 00:03:08,330
Diagonals are all minus one, and since

45
00:03:08,330 --> 00:03:10,470
there's a vertex, there's an edge which
comes in from

46
00:03:10,470 --> 00:03:13,400
two, and there's an edge which goes out
from one.

47
00:03:14,440 --> 00:03:17,130
Now, let's take a look at three.

48
00:03:17,130 --> 00:03:23,250
It has two edges going out.
One to four, which is this entry.

49
00:03:23,250 --> 00:03:29,740
And one to two, which is this entry.
And it has one edge coming in

50
00:03:29,740 --> 00:03:33,670
from four, which is this edge over here.
And you can verify that this particular

51
00:03:33,670 --> 00:03:37,300
matrix, which is called the adjacency
matrix of

52
00:03:37,300 --> 00:03:40,610
this graph, represents all the edges in
this graph.

53
00:03:40,610 --> 00:03:45,700
So, the diagram represents, just the
vertex, and we treat them as minus one.

54
00:03:46,860 --> 00:03:49,200
for reasons which we won't go into in this
course.

55
00:03:50,246 --> 00:03:55,330
and the off diagonal entries represent
edges in this graph.

56
00:03:57,260 --> 00:03:59,190
Now what about the vector X itself?

57
00:03:59,190 --> 00:04:05,904
Well we assume each vertex holds a value,
which is nothing but the element

58
00:04:05,904 --> 00:04:12,200
of the x vector in the appropriate
position corresponding to that vertex.

59
00:04:13,230 --> 00:04:17,450
So the vector 1, 2, 3, 4, is distributed
across the vertices of

60
00:04:17,450 --> 00:04:23,000
the graph as just 1, 2, 3, 4, which is one
element per vertex.

61
00:04:23,000 --> 00:04:24,370
Now let's think of the matrix vector

62
00:04:24,370 --> 00:04:32,270
multiplication as being done in parallel
by each vertex, but in the following way.

63
00:04:34,520 --> 00:04:39,190
Each vertex sends its value out, along all
its outward edges.

64
00:04:39,190 --> 00:04:41,890
So two sends its value to these two
vertices,

65
00:04:41,890 --> 00:04:44,200
and one sends it to this one and so on.

66
00:04:44,200 --> 00:04:48,800
And then all the vertexes start receiving
the values that were sent in this step,

67
00:04:50,870 --> 00:04:57,040
and they receive values from all their
neighbors on incoming edges, and finally

68
00:04:57,040 --> 00:05:02,570
the vertex replaces its value by the sum
of everything it receives

69
00:05:02,570 --> 00:05:05,320
minus the value that it originally had,
which is essentially as you

70
00:05:05,320 --> 00:05:08,940
can easily see, It performs exactly

71
00:05:08,940 --> 00:05:11,110
the matrix-vector multiplication that was
going on.

72
00:05:12,880 --> 00:05:16,340
Now, the interesting part of this is a,
it's very parallel

73
00:05:16,340 --> 00:05:19,570
in the sense that as long as you know, you
have many processors, and each

74
00:05:19,570 --> 00:05:22,590
processor has bunches of vertices, you can
really

75
00:05:22,590 --> 00:05:24,333
do a lot of this operation in parallel.

76
00:05:24,333 --> 00:05:29,160
The only caveat is that, you know, you
have to exchange your

77
00:05:29,160 --> 00:05:34,050
messages, so you try to make sure that the
vertices are placed so

78
00:05:34,050 --> 00:05:38,690
that they don't have to exchange messages
between different processors too often,

79
00:05:38,690 --> 00:05:42,120
and that's a challenge, as we'll come to
that in a short while.

80
00:05:42,120 --> 00:05:46,051
But the really important part of this is
that it's easy to iterate.

81
00:05:47,080 --> 00:05:49,190
Once the graph's structure has been loaded

82
00:05:49,190 --> 00:05:52,400
into, all the processors so that every
vertex

83
00:05:52,400 --> 00:05:55,540
knows its outgoing edges then you don't,

84
00:05:55,540 --> 00:05:58,700
don't need to load, this matrix structure
again.

85
00:05:58,700 --> 00:06:02,084
You can just keep iterating, again and
again, and repeating

86
00:06:02,084 --> 00:06:07,230
the matrix-vector products As is often
required in many applications.

87
00:06:08,305 --> 00:06:13,530
Unlike MapReduce, where you have to load a
continuously of iteration

88
00:06:13,530 --> 00:06:17,990
is loaded once, since all A is, is just a

89
00:06:17,990 --> 00:06:22,080
vertex and it's, connections to neighbors,
that's all.

90
00:06:23,670 --> 00:06:30,260
So the caveat of course is that if the
number of edges is so huge that,

91
00:06:32,480 --> 00:06:34,970
even using all the memory of all the
processors that

92
00:06:34,970 --> 00:06:37,170
you have, you can't load it, then you
can't do this.

93
00:06:37,170 --> 00:06:40,140
So the caveat for Pregel, this is the
Pregel model

94
00:06:40,140 --> 00:06:42,500
actually, we will come to that formally in
a minute,

95
00:06:42,500 --> 00:06:46,590
the caveat is that you need to have enough
memory

96
00:06:46,590 --> 00:06:49,820
to load the entire graph in order for this
to work.