1
00:00:00,000 --> 00:00:04,112
In this video, we'll apply the divide and
conquer algorithm design paradigm to the

2
00:00:04,112 --> 00:00:08,074
problem of multiplying matrices. This will
culminate in the study of Strassen's

3
00:00:08,074 --> 00:00:12,237
Matrix Multiplication Algorithm. And this
is a super cool algorithm for two reasons.

4
00:00:12,237 --> 00:00:15,948
First of all, Strassen's Algorithm is
completely non trivial. It is totally

5
00:00:15,948 --> 00:00:19,910
non-obvious, very clever. Not at all clear
how Strassen ever came up with it. The

6
00:00:19,910 --> 00:00:23,922
second cool feature, is, it's for such a
fundamental problem. So computers, as long

7
00:00:23,922 --> 00:00:27,834
as they've been in use, from the time they
were invented, up'til today, a lot of

8
00:00:27,834 --> 00:00:31,495
their cycles are spent multiplying
matrics, 'cause it just come up all the

9
00:00:31,495 --> 00:00:35,544
time in important applications. So let me
first just make sure we're all clear on

10
00:00:35,544 --> 00:00:40,442
what the, what the problem is of.
Multiplying two matrices. So, we're gonna

11
00:00:40,442 --> 00:00:44,971
be interested in three matrices, X, Y, and
Z. They're all gonna, I'm gonna assume

12
00:00:44,971 --> 00:00:50,894
they all have the same dimensions, N by N.
The ideas we'll talk about are also

13
00:00:50,894 --> 00:00:54,555
relevant for multiplying non square
matrices, but we're not gonna discuss it

14
00:00:54,555 --> 00:00:58,168
in this video. The entries in these
matrices, you know, you could think of it

15
00:00:58,168 --> 00:01:01,733
as whatever you want. Maybe they're
integers, maybe they're rationals, maybe

16
00:01:01,733 --> 00:01:05,586
they're from some [inaudible] field. It
depends on the application. But the point

17
00:01:05,586 --> 00:01:11,308
is, they're just entries that we can add
and multiply. So how is it that you take

18
00:01:11,308 --> 00:01:17,244
two N by N matrices, X and Y, and multiply
them producing a new N by N matrix, Z?

19
00:01:17,244 --> 00:01:23,179
Well, recall that the IJ entry of Z, that
means the entry in the Ith row and Jth

20
00:01:23,179 --> 00:01:30,143
column, is simply the dot product of the
Ith row of X with the Jth column of Y. So

21
00:01:30,143 --> 00:01:34,812
if IJ was this red square, this
[inaudible] over in the Z matrix, that

22
00:01:34,812 --> 00:01:40,362
would be derived from the corresponding
row of the X matrix, and the corresponding

23
00:01:40,362 --> 00:01:45,844
column of the Y matrix. And recall what I
mean by dot product. That just means you

24
00:01:45,844 --> 00:01:50,920
take the products of the individual
components, and then add up the results.

25
00:01:52,180 --> 00:01:58,080
So ultimately, the ZIJ entry boils down to
a sum over N things, where each of the

26
00:01:58,080 --> 00:02:03,833
constituent products is just the XIK
entry. The [inaudible] of the matrix X

27
00:02:03,833 --> 00:02:10,102
with the KJ entry, of the matrix Y, where
your K is ranging from one to N. So that's

28
00:02:10,102 --> 00:02:16,311
how ZIJ is defined for a given pair of
indices, I and J. One thing to note is

29
00:02:16,311 --> 00:02:20,679
where we often use N to denote the input
size, here we're using N to note the

30
00:02:20,679 --> 00:02:25,160
dimension of each of these matricies. The
input size is not N. The input size is

31
00:02:25,160 --> 00:02:29,585
quite a bit bigger than N. Specifically,
each of these are N by N matricies and

32
00:02:29,585 --> 00:02:34,575
contain N squared entries. So since
presumably we have to read the input which

33
00:02:34,575 --> 00:02:38,760
has size and square. Which happens to
produce the output that also has size and

34
00:02:38,760 --> 00:02:42,999
squared. The best we can really hope for
[inaudible] is multiplication hour with

35
00:02:42,999 --> 00:02:46,920
the running time n squared. So the
question is how close when we get to it.

36
00:02:47,480 --> 00:02:51,571
Before we talk about algorithms for matrix
multiplication, let me just make sure

37
00:02:51,571 --> 00:02:55,560
we're all crystal clear on exactly what
the problem is. So let's just actually

38
00:02:55,560 --> 00:02:59,498
spell out what would be the result of
multiplying two different, two bytes of

39
00:02:59,498 --> 00:03:03,962
matrices. So we can. [inaudible] 21
generic 2x2 matricies by just giving the

40
00:03:03,962 --> 00:03:08,888
first one entries A, B, C, and D for these
four entries could all be anything. And

41
00:03:08,888 --> 00:03:13,752
then we're multiplying by a second 2x2
matrix, let's call it entries E, F, G, and

42
00:03:13,752 --> 00:03:18,740
H. Now, what's the result of multiplying
these, where again, it's going to be a 2x2

43
00:03:18,740 --> 00:03:23,604
matrix for each entry, it's just the
corresponding dot product of the relevant

44
00:03:23,604 --> 00:03:28,655
row of the first matrix and column of the
second matrix. So to get the upper left

45
00:03:28,655 --> 00:03:32,974
entry. You take the doc product of the
upper row of the first matrix and the

46
00:03:32,974 --> 00:03:40,898
first column of the left column of the
second matrix. So, that results in. Ae

47
00:03:40,898 --> 00:03:50,452
plus BG. To get the upper right entry, we
take the dot product of the top row of the

48
00:03:50,452 --> 00:03:58,218
left matrix with the right column of the
second matrix. So that gives us AF+BH. And

49
00:03:58,218 --> 00:04:05,226
then filling in the other entries the same
way, we get CE+DG and DF+DH. You know, so

50
00:04:05,226 --> 00:04:10,063
that's multiplying two matrices, and we've
already discussed the definition in

51
00:04:10,063 --> 00:04:14,900
general. Now, suppose you had to write a
program to actually compute to the result

52
00:04:14,900 --> 00:04:19,617
of multiplying two N by N matrices. One
natural way to do that would just be to

53
00:04:19,617 --> 00:04:24,514
return to the definition and which defines
each of the N squared entries in the Z

54
00:04:24,514 --> 00:04:28,992
matrix as a suitable sum of products of
[inaudible] entries of the X and Y

55
00:04:28,992 --> 00:04:33,934
matrices. So on the next quiz, I'd like
you to. Figure out what exactly would be

56
00:04:33,934 --> 00:04:39,007
the running time of that algorithm as a
function of the matrix dimension N where

57
00:04:39,007 --> 00:04:43,703
as usual we count the addition or
multiplication of two individual entries

58
00:04:43,703 --> 00:04:51,196
as a constant time operation. So the
correct response to this quiz is the third

59
00:04:51,196 --> 00:04:56,495
answer, that the running time of the
straightforward [inaudible] algorithm runs

60
00:04:56,495 --> 00:05:01,862
in cubic time relative to the matrix
dimension n. To see this let's just recall

61
00:05:01,862 --> 00:05:06,889
what the definition of the matrix
multiplication was. The definition tells

62
00:05:06,889 --> 00:05:12,120
us each entry zij of the output matrix z
is defined as the sum from k=1 to n of.

63
00:05:12,120 --> 00:05:16,794
Xik times YKJ. That is the [inaudible]
product of the [inaudible] row of the X

64
00:05:16,794 --> 00:05:21,529
matric and the J column of the Y matrix.
Certainly assuming that we have the

65
00:05:21,529 --> 00:05:26,564
matrices represented in a way that we can
access a given entry in constant time. And

66
00:05:26,564 --> 00:05:31,306
under that assumption, remember each of
these, each of these products. Only takes

67
00:05:31,306 --> 00:05:36,334
constant time. And so then to compute ZIJ
we just have to add up these end products.

68
00:05:36,334 --> 00:05:41,361
So that's gonna be theta of N time to
compute a given ZIJ and then there's an N

69
00:05:41,361 --> 00:05:45,602
squared [inaudible] that we have to
compute. There's N choices for I, N

70
00:05:45,602 --> 00:05:50,993
choices for J, so that gives us N squared
times N or cubic running time overall for

71
00:05:50,993 --> 00:05:55,839
the natural algorithm, which is really
just a triple for-loop which computes

72
00:05:55,839 --> 00:06:00,745
each entry of the output ray separately
using the dot product. So the question as

73
00:06:00,745 --> 00:06:06,396
always for the keen algorithm designer is.
Can we do better? Can we beat, in cube

74
00:06:06,396 --> 00:06:12,828
time, by multiplying two matrices? And we
might be especially emboldened with the

75
00:06:12,828 --> 00:06:17,575
progress that we've already seen in terms
of multiplying two integers. We apply the

76
00:06:17,575 --> 00:06:22,436
divide and conquer algorithm, to problem
multiplying two end digit integers. And we

77
00:06:22,436 --> 00:06:26,937
had, both a naive recursive algorithm, and
a seemingly better. Algorithm due to

78
00:06:26,937 --> 00:06:30,031
[inaudible], which made only three
recursive calls. Now we haven't yet

79
00:06:30,031 --> 00:06:33,260
analyzed the running time of that
algorithm. But as we'll see later, that

80
00:06:33,260 --> 00:06:36,714
does indeed beat the quadratic running
time of the grade school algorithm. So

81
00:06:36,714 --> 00:06:40,078
it's very natural to ask, can we do
exactly the same thing here? There's the

82
00:06:40,078 --> 00:06:43,307
obvious [inaudible] algorithm, which
follow straight from the definition.

83
00:06:43,307 --> 00:06:46,805
Perhaps analogous to [inaudible], we could
have some clever divide and conquer

84
00:06:46,805 --> 00:06:51,110
method, which beats cubic times. So that's
what we're gonna explore next. [sound].

85
00:06:51,110 --> 00:06:54,704
Let's recall the divide and conquer
paradigm, what does it mean to use it.

86
00:06:54,704 --> 00:06:58,693
Well, we first have to identify smaller
problems. So if we want to multiply by two

87
00:06:58,693 --> 00:07:02,435
NxN matricies we have to identify
multiplications of smaller matricies that

88
00:07:02,435 --> 00:07:06,079
we can solve recursively. Once we've
figured out how we want to divide the

89
00:07:06,079 --> 00:07:10,117
given problem into smaller ones, then in
the conquer step we simply invoke our own

90
00:07:10,117 --> 00:07:14,105
algorithm recursively that's going to
recursively multiply the smaller matricies

91
00:07:14,105 --> 00:07:18,044
together. And then, in general, we'll have
to combine the results of the recursive

92
00:07:18,044 --> 00:07:22,082
calls to get the solution for the original
problem, in our case, to get the product

93
00:07:22,082 --> 00:07:26,905
of the original two matricies. From the
product of what ever sub matrices we

94
00:07:26,905 --> 00:07:33,365
identify. So how would be apply the divide
and conquer paradigm to matrices? So we're

95
00:07:33,365 --> 00:07:38,806
given two N by N matrices, and we have
to somehow identify smaller pairs of

96
00:07:38,806 --> 00:07:43,765
square matrices that we can multiply
recursively. So the idea, I think, is

97
00:07:43,765 --> 00:07:49,071
fairly natural. So we start with a big N by N matrix X. And so those N rows and

98
00:07:49,071 --> 00:07:53,004
N columns, we have to somehow divide
into smaller pieces. Now, the first thing

99
00:07:53,004 --> 00:07:57,088
you might think about is that you put it
in its left half and its right half and

100
00:07:57,088 --> 00:08:00,970
now it goes into what we've been doing
with the rays, but then we're going to

101
00:08:00,970 --> 00:08:05,105
break X into two matrices which are no
longer squared which are N over two in one

102
00:08:05,105 --> 00:08:09,290
dimension and have length N in the other
dimension and we want to recursively call

103
00:08:09,290 --> 00:08:13,022
a subroutine that multiplies square
matrices. So what seems like the clear

104
00:08:13,022 --> 00:08:17,677
thing to do is to divide X into quadrants.
Okay, so we have four pieces of X. Each is

105
00:08:17,677 --> 00:08:22,443
gonna be N over two by N over two,
corresponding to the different quarters of

106
00:08:22,443 --> 00:08:26,899
this matrix. So let's call these different
quadrants or blocks, in matrix

107
00:08:26,899 --> 00:08:31,727
terminology, A, B, C, and D. All of these
are N over two by N over two matrices. As

108
00:08:31,727 --> 00:08:36,678
usual, for simplicity, I'm assuming that N
is even, and as usual, it doesn't really

109
00:08:36,678 --> 00:08:45,296
matter. And we can do the same trick with
Y. So we'll divide Y into quadrants. And

110
00:08:45,296 --> 00:08:53,860
number two by N number two matrices which
we'll call E, F, G and H. So one thing

111
00:08:53,860 --> 00:08:58,368
that's cool about matrices, is when you
split them into blocks, and you multiply

112
00:08:58,368 --> 00:09:03,789
them, the blocks just behave as if they
were atomic elements. So what I mean by

113
00:09:03,789 --> 00:09:08,903
that is that the product of X and Y can be
expressed in terms of its quadrants and

114
00:09:08,903 --> 00:09:14,079
each of its four quadrants, each of its
four corners can be written as a suitable

115
00:09:14,079 --> 00:09:19,009
arithmetic expression of the quadrants of
X and Y. So here's exactly what those

116
00:09:19,009 --> 00:09:24,000
formulas are. They are exactly analogous
to when we just multiplied pair of two by

117
00:09:24,000 --> 00:09:29,231
two matrices. So I'm not going to formally
prove this fact. I'm sure many of you,

118
00:09:29,384 --> 00:09:33,318
have seen it before or are familiar with
it. And if you haven't, it's actually

119
00:09:33,318 --> 00:09:37,354
quite easy to prove. It's not obvious, you
can't see it off the top of your head,

120
00:09:37,354 --> 00:09:41,390
necessarily. But if you go back to the
definition, it's quite easy to verify. The

121
00:09:41,390 --> 00:09:45,272
D, when you multiple X and Y, you can
express as quadrants to blocks, in terms

122
00:09:45,272 --> 00:09:49,097
of the blocks of X and Y. So what we just
did is completely analogous to when

123
00:09:49,097 --> 00:09:52,869
talking about integer multiplication and
you wanted to multiply two integers,

124
00:09:52,869 --> 00:09:56,837
little x and little y, and we broke them
into pairs of N over two digits. And then

125
00:09:56,837 --> 00:10:00,854
we just took the expansion and we observed
how that expansion could be written in

126
00:10:00,854 --> 00:10:04,675
terms of products of N over two digit
numbers. Same thing going on here except

127
00:10:04,675 --> 00:10:08,883
with matrices. So now, we're in business,
as far as a recursive approach. We wanna

128
00:10:08,883 --> 00:10:13,382
multiply X and Y. They're N by N matrices.
We recognize we can express that product X

129
00:10:13,382 --> 00:10:17,506
times Y, in terms of the products of N
over two by N over two matrices. Things

130
00:10:17,506 --> 00:10:21,523
we're able to multiply recursively, plus
additions. And additions [inaudible]

131
00:10:21,523 --> 00:10:25,969
clearly easy to multiply two different
matrices with say, N squared entries each,

132
00:10:25,969 --> 00:10:30,039
it's gonna be linear in the number of
entries. So it's gonna be N squared

133
00:10:30,039 --> 00:10:34,697
[inaudible] add two matrices that are N by
N. So this immediately leads us to our

134
00:10:34,697 --> 00:10:39,807
first recursive algorithm. To describe it,
let me quickly rewrite that expression we

135
00:10:39,807 --> 00:10:45,977
just saw on the previous slide. And now,
our first recursive algorithm is simply to

136
00:10:45,977 --> 00:10:50,452
evaluate all of these expressions in the
obvious way. So specifically, in step one,

137
00:10:50,452 --> 00:10:54,651
we recursively compute all of the
necessary products, and observe that there

138
00:10:54,651 --> 00:10:58,905
are eight products that we have to
compute. Eight products of N over two by N

139
00:10:58,905 --> 00:11:03,325
over two matrices. There are four entries
in this expansion of X times Y. Each of

140
00:11:03,325 --> 00:11:07,524
the, each of the blocks is the sum of two
products, and none of the products

141
00:11:07,524 --> 00:11:11,612
re-occurred, they're all distinct. So,
naively, if you wanna evaluate this, we

142
00:11:11,612 --> 00:11:16,176
have to eight different products of N over
two by N over two matrices. Once those

143
00:11:16,176 --> 00:11:21,494
recursive calls complete, then all we do
is do the, necessary four additions. As we

144
00:11:21,494 --> 00:11:26,812
discussed, that takes time proportional to
the number of entries in a matrix. So this

145
00:11:26,812 --> 00:11:31,560
is gonna take quadratic time overall,
quadratic [inaudible] N, linear in the

146
00:11:31,560 --> 00:11:37,103
number of entries. Now, the question you
should be asking is. Is this a good

147
00:11:37,103 --> 00:11:40,969
algorithm? Was this good for anything,
this recursive approach, splitting X and Y

148
00:11:40,969 --> 00:11:44,932
into these blocks, expanding the product
in terms of these blocks, the recursively

149
00:11:44,932 --> 00:11:48,945
computing each of the blocks. And I want
to say it's totally not obvious, it is not

150
00:11:48,945 --> 00:11:52,615
clear what the running time of this
recursive algorithm is. I'm going to go

151
00:11:52,615 --> 00:11:56,382
ahead and give you a spoiler which is
going to follow from the master method

152
00:11:56,382 --> 00:12:00,150
that we'll talk about in the next lecture.
But it turns out with this kind of

153
00:12:00,150 --> 00:12:03,967
recursive algorithm where you do eight
recursive calls, each on a problem with

154
00:12:03,967 --> 00:12:07,441
dimensions half as much as what you
started with, and then do quadratic

155
00:12:07,441 --> 00:12:12,320
[inaudible] outside. The right time is
going to be. Cubic. So exactly the same as

156
00:12:12,320 --> 00:12:16,527
with the straightforward iterative
algorithm that follows from the

157
00:12:16,527 --> 00:12:21,236
definition. That was cubic, it turns out,
and that was clearly cubic. This one,

158
00:12:21,236 --> 00:12:25,882
although it's not obvious, is cubic as
well. So no better, no worse than the

159
00:12:25,882 --> 00:12:31,442
straightforward iterative algorithm. So in
case you're feeling disappointed that we

160
00:12:31,442 --> 00:12:35,009
went through all this work and this sort
of seemingly clever divide and conquer

161
00:12:35,009 --> 00:12:38,711
approach for matrix multiplication, and,
and came out at the end no better than the

162
00:12:38,711 --> 00:12:42,098
interactive algorithm. Well, there's
really no reason to despair, cause remember,

163
00:12:42,098 --> 00:12:45,755
back in integer multiplication, we had a
straightforward recursive algorithm where

164
00:12:45,755 --> 00:12:49,142
we had to do four recursive calls,
products of N over two digit numbers. But

165
00:12:49,142 --> 00:12:52,754
then, we had [inaudible] trick which said,
oh, if we only did more clever products

166
00:12:52,754 --> 00:12:55,869
and more clever additions and
subtractions, then we can get away with

167
00:12:55,869 --> 00:12:59,482
only three recursive calls. And we'll see
later that, that isn't even significant

168
00:12:59,482 --> 00:13:02,778
savings, in the time [inaudible]
multiplication. And we've done nothing

169
00:13:02,778 --> 00:13:06,937
analogously. [inaudible] douse's trick, it
was matrix multiplication problem. All we

170
00:13:06,937 --> 00:13:11,071
did is the naive expansion in terms of
sub-matrices, and the naive evaluation of

171
00:13:11,071 --> 00:13:15,976
the resulting expressions. So. $64,000
question is then, can we do something

172
00:13:15,976 --> 00:13:22,326
clever to reduce the number of recursive
calls from eight down to something lower

173
00:13:22,326 --> 00:13:29,532
and that is where Strassen's algorithm
comes in. So at a high level, Strassen's

174
00:13:29,532 --> 00:13:33,926
algorithm has two steps, just like the
first recursive algorithm that we

175
00:13:33,926 --> 00:13:38,870
discussed. It recursively computes some
products of smaller matrices and over two

176
00:13:38,870 --> 00:13:44,760
[inaudible] by two matrices. [sound] But
there's only going to be seven of them.

177
00:13:45,940 --> 00:13:50,101
But they will be much less
straightforward, they will be much more

178
00:13:50,101 --> 00:13:54,648
cleverly chosen than in the first
recursive algorithm. [sound]. And step

179
00:13:54,648 --> 00:13:59,578
two, then, is to actually produce the
product of X and Y, produce each of those

180
00:13:59,578 --> 00:14:04,892
four blocks that we saw, with suitable,
additions and subtractions of these seven

181
00:14:04,892 --> 00:14:09,567
products. And again, these are much less
straightforward than in the first

182
00:14:09,567 --> 00:14:14,938
recursive algorithm. [sound]. And so while
the additions and tractions involved will

183
00:14:14,938 --> 00:14:18,896
be a little bit more numerous, then they
were in the naive recursive algorithm.

184
00:14:18,896 --> 00:14:22,615
It's only gonna change the work in that
part of the algorithm by a constant

185
00:14:22,615 --> 00:14:26,529
factor. So we'll still spend only theta
even squared work adding and subtracting

186
00:14:26,529 --> 00:14:30,101
things. And we get a huge win in
decreasing the number of recursive calls

187
00:14:30,101 --> 00:14:34,211
from eight to seven. Now, just in case you
have the intuition that shaving off one of

188
00:14:34,211 --> 00:14:38,242
the recursive calls. Should only decrease
the running time of an algorithm by

189
00:14:38,242 --> 00:14:42,472
one-eighth, by 12.5%, in fact it has a
tremendously more amplified effect because

190
00:14:42,472 --> 00:14:46,756
we do one less recursive call over and
over and over again as we keep recursing

191
00:14:46,756 --> 00:14:50,505
in the algorithm. So it makes a
fundamental difference in the eventual

192
00:14:50,505 --> 00:14:54,575
running time of the algorithm, as we'll
explore in detail in the next set of

193
00:14:54,575 --> 00:14:58,940
lectures, when we discuss the master
method. So again. A bit of a spoiler

194
00:14:58,940 --> 00:15:03,892
alert. What you're gonna see in the next
set of lectures is indeed. [inaudible]

195
00:15:03,892 --> 00:15:08,724
Rhythm does beat [inaudible]. It's better
than [inaudible]. You'll have to watch the

196
00:15:08,724 --> 00:15:13,557
next set of lectures just so you know what
the running time is. When right now, that

197
00:15:13,557 --> 00:15:19,998
[inaudible] call is what changes the
[inaudible] cubic. Now, 1969 was

198
00:15:19,998 --> 00:15:25,912
obviously, quite a bit before my time,
but. By all accounts, from people I've

199
00:15:25,912 --> 00:15:29,964
talked to who were around then, and from,
you know, what the books say, Strassen's

200
00:15:29,964 --> 00:15:33,965
Algorithm totally blew people's minds at
the time. Everybody was assuming that

201
00:15:33,965 --> 00:15:37,760
there's no way you could do better than
the iterative algorithm, the cubic

202
00:15:37,760 --> 00:15:41,761
algorithm. It just seemed that matrix
multiplication intuitively fundamentally

203
00:15:41,761 --> 00:15:45,608
required all of the calculations that are
spelled out in the definition. So

204
00:15:45,608 --> 00:15:49,815
Strassen's Algorithm is an early glimpse
of the magic and of the power. Of clever

205
00:15:49,815 --> 00:15:54,112
algorithm design. That if you really have
a serious ingenuity, even for super

206
00:15:54,112 --> 00:15:58,409
fundamental problems, you can come up with
fundamental savings over the more

207
00:15:58,409 --> 00:16:02,960
straightforward solutions. So those are
the main points I wanted to talk about

208
00:16:02,960 --> 00:16:07,136
Strassen's Algorithm, how you can beat
cubic time by saving a recursive call with

209
00:16:07,136 --> 00:16:11,209
soon to be chosen clever products and
clever addition subtractions. Maybe a few

210
00:16:11,209 --> 00:16:15,230
of you are wondering, you know, what are
these cleverly chosen products? Can you

211
00:16:15,230 --> 00:16:19,251
really do this? And I don't blame you.
There's no reason to believe me, just cuz

212
00:16:19,251 --> 00:16:23,273
I sort of spelled out this [inaudible]
idea. It's not obvious this should work.

213
00:16:23,273 --> 00:16:27,088
You might actually want to see the
products. So, for those of you like that,

214
00:16:27,088 --> 00:16:33,067
this last slide is for you. So here is
Straus' algorithm in it's somewhat gory

215
00:16:33,067 --> 00:16:38,380
detail. So let me tell you what the seven
products are that we are going to form.

216
00:16:39,360 --> 00:16:44,335
I'm going to label them P1 through P7 and
they're all going to be defined in terms

217
00:16:44,335 --> 00:16:49,371
of the blocks of the inter-matrices X and
Y. So let me just remind you that we think

218
00:16:49,371 --> 00:16:56,410
of X in terms of it's blocks, A B C D. And
we think of Y in terms of its blocks E, F,

219
00:16:56,410 --> 00:17:04,873
G, H. And remember, A through H are all N
over two by N over two sub-matricies. So

220
00:17:04,873 --> 00:17:13,229
here are the seven products, P1 through
P7. P1 is A times quantity F minus H. P2

221
00:17:13,229 --> 00:17:25,051
is quantity A plus B times H. P3 is C plus
D times E. [sound]. P4 is D times G - E,

222
00:17:25,051 --> 00:17:40,193
P5 is quantity A+D + quantity A+H. P six
is quantity B minus D times quantity G

223
00:17:40,193 --> 00:17:49,964
plus H and finally P seven is quantity A
minus C E plus F. So I hope you'll agree

224
00:17:49,964 --> 00:17:54,225
that these are indeed only seven products,
and we could compute these with seven

225
00:17:54,225 --> 00:17:58,646
recursive calls. We've preprocessed with a
little bit of additions and subtractions.

226
00:17:58,646 --> 00:18:03,013
We have to compute F minus H, A plus B, C
plus D and so on. We compute all these new

227
00:18:03,013 --> 00:18:07,273
matrices from the blocks, and we can then
recursively, with seven recursive calls,

228
00:18:07,273 --> 00:18:11,641
do these seven products that operate on N
over two by N over two matrices. Now, the

229
00:18:11,641 --> 00:18:15,901
question is, why is this useful? Why on
Earth would we wanna know the [inaudible]

230
00:18:15,901 --> 00:18:20,758
of these seven products? So the amazing
other part of the algorithm is that from

231
00:18:20,758 --> 00:18:25,934
just these seven products, we can, using
only addition and subtraction, recover all

232
00:18:25,934 --> 00:18:34,382
four of the blocks of. A X times Y So x
times y. You'll recall we expanded because

233
00:18:34,382 --> 00:18:40,564
of the blocks. So we previously computed
this to be AE+BG in the upper left corner,

234
00:18:40,564 --> 00:18:46,120
and [inaudible] expressions for the upper
right, lower left, and lower right blocks.

235
00:18:46,120 --> 00:18:51,744
So this we already know. So the content of
the claim is that these four blocks also

236
00:18:51,744 --> 00:18:58,548
arise from the seven products in the
following way. So the claim here is that

237
00:18:58,548 --> 00:19:03,072
these two different expressions for X
times Y are exactly the same and they're

238
00:19:03,072 --> 00:19:07,518
the same block by block. So in other
words, what the claim is then this. Crazy

239
00:19:07,518 --> 00:19:13,512
expression. P five plus P four minus P two
plus P six. Where those were four of the

240
00:19:13,512 --> 00:19:18,524
products we listed above. That is
precisely A plus B G. Similarly we're,

241
00:19:18,524 --> 00:19:23,914
we're claiming that P1 plus P2 is exactly
AF plus BH. That's actually easy to see.

242
00:19:23,914 --> 00:19:29,504
P3 plus P4 is CE plus DG. That's also easy
to see, and then the other [inaudible] one

243
00:19:29,504 --> 00:19:34,694
is that P1 plus P5 minus P3 minus P7 is
exactly the same as CF plus DH, so all

244
00:19:34,694 --> 00:19:40,217
four of those hold. So let me just so you
believe me cuz I don't know why you would

245
00:19:40,217 --> 00:19:45,540
believe me unless I actually showed you
some of this derivation. Let's just look

246
00:19:45,540 --> 00:19:53,654
at the proof of one of the cases of the
upper left corner. So that is, let's just

247
00:19:53,654 --> 00:20:03,841
expand out this crazy expression.
P5+P4-P2+P6, what do we get? Well, from

248
00:20:03,841 --> 00:20:15,062
P5, we get AE+AH+D+DH, and we add P4, so
that's gonna give us, Plus DG minus DE,

249
00:20:15,062 --> 00:20:26,075
then we subtract P2, so that it gives us a
minus, minus DH and then we add in P6. So

250
00:20:26,075 --> 00:20:36,440
that gives us a PG plus BH minus DG minus
DH. Okay, so what happens next, well now

251
00:20:36,440 --> 00:20:46,397
we look for cancellations. So we cancel
the H's. We cancel the D.E.'s, we cancel

252
00:20:46,397 --> 00:20:56,234
the D.H.'s. We cancel the DGs. We cancel
the BHs. And holy cow what do we get, we

253
00:20:56,234 --> 00:21:03,375
get A, E. Plus B G. That is, we get
exactly what we were suppose to. In. The

254
00:21:03,375 --> 00:21:08,520
upper left block of X times Y. So we just
actually verified that this equation holds

255
00:21:08,520 --> 00:21:13,543
for the upper left block. It's quite easy
to see that it holds for the upper right

256
00:21:13,543 --> 00:21:18,199
and lower left blocks and a comparable
calculation verifies it for the lower

257
00:21:18,199 --> 00:21:22,760
right blocks of the two. So summarizing,
because this claim holds. >> Because,

258
00:21:22,760 --> 00:21:27,240
actually, we can recover the four blocks
of S times Y from the seven products.

259
00:21:27,240 --> 00:21:31,894
Strauss' algorithm works in the following
way: you compute the seven products, P1

260
00:21:31,894 --> 00:21:36,375
through P7, using seven recursive calls.
Then you just compute the four blocks

261
00:21:36,375 --> 00:21:40,389
using some extra additions and
subtractions as shown in the claim. So

262
00:21:40,389 --> 00:21:45,083
seven recursive calls on a number two by
number two matrices, plus. N squared work

263
00:21:45,083 --> 00:21:49,634
to do the necessary additions as we'll see
on the master method lecture that is

264
00:21:49,634 --> 00:21:57,273
actually sufficient for. Sub humid time.
Now I sympathize with you if you have the

265
00:21:57,273 --> 00:22:05,503
following question. Which is how on Earth
did Strauson come up with this? And

266
00:22:05,503 --> 00:22:09,871
indeed, this sort of illustrates, the
difference between checking somebody's

267
00:22:09,871 --> 00:22:14,462
proof, and coming up with a proof. Given
that I told you the magical seven products

268
00:22:14,462 --> 00:22:18,717
and how you, from them, can recover the
four desired blocks of x times y, it's

269
00:22:18,717 --> 00:22:23,085
really just mechanical to see that it
works. It's a totally different story of

270
00:22:23,085 --> 00:22:27,676
how you come up with p1 through p7 in the
first place. So how did Strassen come up

271
00:22:27,676 --> 00:22:30,420
with them? Honestly, your guess is as good
as mine.