In this video, we'll apply the divide and
conquer algorithm design paradigm to the
problem of multiplying matrices. This will
culminate in the study of Strassen's
Matrix Multiplication Algorithm. And this
is a super cool algorithm for two reasons.
First of all, Strassen's Algorithm is
completely non trivial. It is totally
non-obvious, very clever. Not at all clear
how Strassen ever came up with it. The
second cool feature, is, it's for such a
fundamental problem. So computers, as long
as they've been in use, from the time they
were invented, up'til today, a lot of
their cycles are spent multiplying
matrics, 'cause it just come up all the
time in important applications. So let me
first just make sure we're all clear on
what the, what the problem is of.
Multiplying two matrices. So, we're gonna
be interested in three matrices, X, Y, and
Z. They're all gonna, I'm gonna assume
they all have the same dimensions, N by N.
The ideas we'll talk about are also
relevant for multiplying non square
matrices, but we're not gonna discuss it
in this video. The entries in these
matrices, you know, you could think of it
as whatever you want. Maybe they're
integers, maybe they're rationals, maybe
they're from some [inaudible] field. It
depends on the application. But the point
is, they're just entries that we can add
and multiply. So how is it that you take
two N by N matrices, X and Y, and multiply
them producing a new N by N matrix, Z?
Well, recall that the IJ entry of Z, that
means the entry in the Ith row and Jth
column, is simply the dot product of the
Ith row of X with the Jth column of Y. So
if IJ was this red square, this
[inaudible] over in the Z matrix, that
would be derived from the corresponding
row of the X matrix, and the corresponding
column of the Y matrix. And recall what I
mean by dot product. That just means you
take the products of the individual
components, and then add up the results.
So ultimately, the ZIJ entry boils down to
a sum over N things, where each of the
constituent products is just the XIK
entry. The [inaudible] of the matrix X
with the KJ entry, of the matrix Y, where
your K is ranging from one to N. So that's
how ZIJ is defined for a given pair of
indices, I and J. One thing to note is
where we often use N to denote the input
size, here we're using N to note the
dimension of each of these matricies. The
input size is not N. The input size is
quite a bit bigger than N. Specifically,
each of these are N by N matricies and
contain N squared entries. So since
presumably we have to read the input which
has size and square. Which happens to
produce the output that also has size and
squared. The best we can really hope for
[inaudible] is multiplication hour with
the running time n squared. So the
question is how close when we get to it.
Before we talk about algorithms for matrix
multiplication, let me just make sure
we're all crystal clear on exactly what
the problem is. So let's just actually
spell out what would be the result of
multiplying two different, two bytes of
matrices. So we can. [inaudible] 21
generic 2x2 matricies by just giving the
first one entries A, B, C, and D for these
four entries could all be anything. And
then we're multiplying by a second 2x2
matrix, let's call it entries E, F, G, and
H. Now, what's the result of multiplying
these, where again, it's going to be a 2x2
matrix for each entry, it's just the
corresponding dot product of the relevant
row of the first matrix and column of the
second matrix. So to get the upper left
entry. You take the doc product of the
upper row of the first matrix and the
first column of the left column of the
second matrix. So, that results in. Ae
plus BG. To get the upper right entry, we
take the dot product of the top row of the
left matrix with the right column of the
second matrix. So that gives us AF+BH. And
then filling in the other entries the same
way, we get CE+DG and DF+DH. You know, so
that's multiplying two matrices, and we've
already discussed the definition in
general. Now, suppose you had to write a
program to actually compute to the result
of multiplying two N by N matrices. One
natural way to do that would just be to
return to the definition and which defines
each of the N squared entries in the Z
matrix as a suitable sum of products of
[inaudible] entries of the X and Y
matrices. So on the next quiz, I'd like
you to. Figure out what exactly would be
the running time of that algorithm as a
function of the matrix dimension N where
as usual we count the addition or
multiplication of two individual entries
as a constant time operation. So the
correct response to this quiz is the third
answer, that the running time of the
straightforward [inaudible] algorithm runs
in cubic time relative to the matrix
dimension n. To see this let's just recall
what the definition of the matrix
multiplication was. The definition tells
us each entry zij of the output matrix z
is defined as the sum from k=1 to n of.
Xik times YKJ. That is the [inaudible]
product of the [inaudible] row of the X
matric and the J column of the Y matrix.
Certainly assuming that we have the
matrices represented in a way that we can
access a given entry in constant time. And
under that assumption, remember each of
these, each of these products. Only takes
constant time. And so then to compute ZIJ
we just have to add up these end products.
So that's gonna be theta of N time to
compute a given ZIJ and then there's an N
squared [inaudible] that we have to
compute. There's N choices for I, N
choices for J, so that gives us N squared
times N or cubic running time overall for
the natural algorithm, which is really
just a triple for-loop which computes
each entry of the output ray separately
using the dot product. So the question as
always for the keen algorithm designer is.
Can we do better? Can we beat, in cube
time, by multiplying two matrices? And we
might be especially emboldened with the
progress that we've already seen in terms
of multiplying two integers. We apply the
divide and conquer algorithm, to problem
multiplying two end digit integers. And we
had, both a naive recursive algorithm, and
a seemingly better. Algorithm due to
[inaudible], which made only three
recursive calls. Now we haven't yet
analyzed the running time of that
algorithm. But as we'll see later, that
does indeed beat the quadratic running
time of the grade school algorithm. So
it's very natural to ask, can we do
exactly the same thing here? There's the
obvious [inaudible] algorithm, which
follow straight from the definition.
Perhaps analogous to [inaudible], we could
have some clever divide and conquer
method, which beats cubic times. So that's
what we're gonna explore next. [sound].
Let's recall the divide and conquer
paradigm, what does it mean to use it.
Well, we first have to identify smaller
problems. So if we want to multiply by two
NxN matricies we have to identify
multiplications of smaller matricies that
we can solve recursively. Once we've
figured out how we want to divide the
given problem into smaller ones, then in
the conquer step we simply invoke our own
algorithm recursively that's going to
recursively multiply the smaller matricies
together. And then, in general, we'll have
to combine the results of the recursive
calls to get the solution for the original
problem, in our case, to get the product
of the original two matricies. From the
product of what ever sub matrices we
identify. So how would be apply the divide
and conquer paradigm to matrices? So we're
given two N by N matrices, and we have
to somehow identify smaller pairs of
square matrices that we can multiply
recursively. So the idea, I think, is
fairly natural. So we start with a big N by N matrix X. And so those N rows and
N columns, we have to somehow divide
into smaller pieces. Now, the first thing
you might think about is that you put it
in its left half and its right half and
now it goes into what we've been doing
with the rays, but then we're going to
break X into two matrices which are no
longer squared which are N over two in one
dimension and have length N in the other
dimension and we want to recursively call
a subroutine that multiplies square
matrices. So what seems like the clear
thing to do is to divide X into quadrants.
Okay, so we have four pieces of X. Each is
gonna be N over two by N over two,
corresponding to the different quarters of
this matrix. So let's call these different
quadrants or blocks, in matrix
terminology, A, B, C, and D. All of these
are N over two by N over two matrices. As
usual, for simplicity, I'm assuming that N
is even, and as usual, it doesn't really
matter. And we can do the same trick with
Y. So we'll divide Y into quadrants. And
number two by N number two matrices which
we'll call E, F, G and H. So one thing
that's cool about matrices, is when you
split them into blocks, and you multiply
them, the blocks just behave as if they
were atomic elements. So what I mean by
that is that the product of X and Y can be
expressed in terms of its quadrants and
each of its four quadrants, each of its
four corners can be written as a suitable
arithmetic expression of the quadrants of
X and Y. So here's exactly what those
formulas are. They are exactly analogous
to when we just multiplied pair of two by
two matrices. So I'm not going to formally
prove this fact. I'm sure many of you,
have seen it before or are familiar with
it. And if you haven't, it's actually
quite easy to prove. It's not obvious, you
can't see it off the top of your head,
necessarily. But if you go back to the
definition, it's quite easy to verify. The
D, when you multiple X and Y, you can
express as quadrants to blocks, in terms
of the blocks of X and Y. So what we just
did is completely analogous to when
talking about integer multiplication and
you wanted to multiply two integers,
little x and little y, and we broke them
into pairs of N over two digits. And then
we just took the expansion and we observed
how that expansion could be written in
terms of products of N over two digit
numbers. Same thing going on here except
with matrices. So now, we're in business,
as far as a recursive approach. We wanna
multiply X and Y. They're N by N matrices.
We recognize we can express that product X
times Y, in terms of the products of N
over two by N over two matrices. Things
we're able to multiply recursively, plus
additions. And additions [inaudible]
clearly easy to multiply two different
matrices with say, N squared entries each,
it's gonna be linear in the number of
entries. So it's gonna be N squared
[inaudible] add two matrices that are N by
N. So this immediately leads us to our
first recursive algorithm. To describe it,
let me quickly rewrite that expression we
just saw on the previous slide. And now,
our first recursive algorithm is simply to
evaluate all of these expressions in the
obvious way. So specifically, in step one,
we recursively compute all of the
necessary products, and observe that there
are eight products that we have to
compute. Eight products of N over two by N
over two matrices. There are four entries
in this expansion of X times Y. Each of
the, each of the blocks is the sum of two
products, and none of the products
re-occurred, they're all distinct. So,
naively, if you wanna evaluate this, we
have to eight different products of N over
two by N over two matrices. Once those
recursive calls complete, then all we do
is do the, necessary four additions. As we
discussed, that takes time proportional to
the number of entries in a matrix. So this
is gonna take quadratic time overall,
quadratic [inaudible] N, linear in the
number of entries. Now, the question you
should be asking is. Is this a good
algorithm? Was this good for anything,
this recursive approach, splitting X and Y
into these blocks, expanding the product
in terms of these blocks, the recursively
computing each of the blocks. And I want
to say it's totally not obvious, it is not
clear what the running time of this
recursive algorithm is. I'm going to go
ahead and give you a spoiler which is
going to follow from the master method
that we'll talk about in the next lecture.
But it turns out with this kind of
recursive algorithm where you do eight
recursive calls, each on a problem with
dimensions half as much as what you
started with, and then do quadratic
[inaudible] outside. The right time is
going to be. Cubic. So exactly the same as
with the straightforward iterative
algorithm that follows from the
definition. That was cubic, it turns out,
and that was clearly cubic. This one,
although it's not obvious, is cubic as
well. So no better, no worse than the
straightforward iterative algorithm. So in
case you're feeling disappointed that we
went through all this work and this sort
of seemingly clever divide and conquer
approach for matrix multiplication, and,
and came out at the end no better than the
interactive algorithm. Well, there's
really no reason to despair, cause remember,
back in integer multiplication, we had a
straightforward recursive algorithm where
we had to do four recursive calls,
products of N over two digit numbers. But
then, we had [inaudible] trick which said,
oh, if we only did more clever products
and more clever additions and
subtractions, then we can get away with
only three recursive calls. And we'll see
later that, that isn't even significant
savings, in the time [inaudible]
multiplication. And we've done nothing
analogously. [inaudible] douse's trick, it
was matrix multiplication problem. All we
did is the naive expansion in terms of
sub-matrices, and the naive evaluation of
the resulting expressions. So. $64,000
question is then, can we do something
clever to reduce the number of recursive
calls from eight down to something lower
and that is where Strassen's algorithm
comes in. So at a high level, Strassen's
algorithm has two steps, just like the
first recursive algorithm that we
discussed. It recursively computes some
products of smaller matrices and over two
[inaudible] by two matrices. [sound] But
there's only going to be seven of them.
But they will be much less
straightforward, they will be much more
cleverly chosen than in the first
recursive algorithm. [sound]. And step
two, then, is to actually produce the
product of X and Y, produce each of those
four blocks that we saw, with suitable,
additions and subtractions of these seven
products. And again, these are much less
straightforward than in the first
recursive algorithm. [sound]. And so while
the additions and tractions involved will
be a little bit more numerous, then they
were in the naive recursive algorithm.
It's only gonna change the work in that
part of the algorithm by a constant
factor. So we'll still spend only theta
even squared work adding and subtracting
things. And we get a huge win in
decreasing the number of recursive calls
from eight to seven. Now, just in case you
have the intuition that shaving off one of
the recursive calls. Should only decrease
the running time of an algorithm by
one-eighth, by 12.5%, in fact it has a
tremendously more amplified effect because
we do one less recursive call over and
over and over again as we keep recursing
in the algorithm. So it makes a
fundamental difference in the eventual
running time of the algorithm, as we'll
explore in detail in the next set of
lectures, when we discuss the master
method. So again. A bit of a spoiler
alert. What you're gonna see in the next
set of lectures is indeed. [inaudible]
Rhythm does beat [inaudible]. It's better
than [inaudible]. You'll have to watch the
next set of lectures just so you know what
the running time is. When right now, that
[inaudible] call is what changes the
[inaudible] cubic. Now, 1969 was
obviously, quite a bit before my time,
but. By all accounts, from people I've
talked to who were around then, and from,
you know, what the books say, Strassen's
Algorithm totally blew people's minds at
the time. Everybody was assuming that
there's no way you could do better than
the iterative algorithm, the cubic
algorithm. It just seemed that matrix
multiplication intuitively fundamentally
required all of the calculations that are
spelled out in the definition. So
Strassen's Algorithm is an early glimpse
of the magic and of the power. Of clever
algorithm design. That if you really have
a serious ingenuity, even for super
fundamental problems, you can come up with
fundamental savings over the more
straightforward solutions. So those are
the main points I wanted to talk about
Strassen's Algorithm, how you can beat
cubic time by saving a recursive call with
soon to be chosen clever products and
clever addition subtractions. Maybe a few
of you are wondering, you know, what are
these cleverly chosen products? Can you
really do this? And I don't blame you.
There's no reason to believe me, just cuz
I sort of spelled out this [inaudible]
idea. It's not obvious this should work.
You might actually want to see the
products. So, for those of you like that,
this last slide is for you. So here is
Straus' algorithm in it's somewhat gory
detail. So let me tell you what the seven
products are that we are going to form.
I'm going to label them P1 through P7 and
they're all going to be defined in terms
of the blocks of the inter-matrices X and
Y. So let me just remind you that we think
of X in terms of it's blocks, A B C D. And
we think of Y in terms of its blocks E, F,
G, H. And remember, A through H are all N
over two by N over two sub-matricies. So
here are the seven products, P1 through
P7. P1 is A times quantity F minus H. P2
is quantity A plus B times H. P3 is C plus
D times E. [sound]. P4 is D times G - E,
P5 is quantity A+D + quantity A+H. P six
is quantity B minus D times quantity G
plus H and finally P seven is quantity A
minus C E plus F. So I hope you'll agree
that these are indeed only seven products,
and we could compute these with seven
recursive calls. We've preprocessed with a
little bit of additions and subtractions.
We have to compute F minus H, A plus B, C
plus D and so on. We compute all these new
matrices from the blocks, and we can then
recursively, with seven recursive calls,
do these seven products that operate on N
over two by N over two matrices. Now, the
question is, why is this useful? Why on
Earth would we wanna know the [inaudible]
of these seven products? So the amazing
other part of the algorithm is that from
just these seven products, we can, using
only addition and subtraction, recover all
four of the blocks of. A X times Y So x
times y. You'll recall we expanded because
of the blocks. So we previously computed
this to be AE+BG in the upper left corner,
and [inaudible] expressions for the upper
right, lower left, and lower right blocks.
So this we already know. So the content of
the claim is that these four blocks also
arise from the seven products in the
following way. So the claim here is that
these two different expressions for X
times Y are exactly the same and they're
the same block by block. So in other
words, what the claim is then this. Crazy
expression. P five plus P four minus P two
plus P six. Where those were four of the
products we listed above. That is
precisely A plus B G. Similarly we're,
we're claiming that P1 plus P2 is exactly
AF plus BH. That's actually easy to see.
P3 plus P4 is CE plus DG. That's also easy
to see, and then the other [inaudible] one
is that P1 plus P5 minus P3 minus P7 is
exactly the same as CF plus DH, so all
four of those hold. So let me just so you
believe me cuz I don't know why you would
believe me unless I actually showed you
some of this derivation. Let's just look
at the proof of one of the cases of the
upper left corner. So that is, let's just
expand out this crazy expression.
P5+P4-P2+P6, what do we get? Well, from
P5, we get AE+AH+D+DH, and we add P4, so
that's gonna give us, Plus DG minus DE,
then we subtract P2, so that it gives us a
minus, minus DH and then we add in P6. So
that gives us a PG plus BH minus DG minus
DH. Okay, so what happens next, well now
we look for cancellations. So we cancel
the H's. We cancel the D.E.'s, we cancel
the D.H.'s. We cancel the DGs. We cancel
the BHs. And holy cow what do we get, we
get A, E. Plus B G. That is, we get
exactly what we were suppose to. In. The
upper left block of X times Y. So we just
actually verified that this equation holds
for the upper left block. It's quite easy
to see that it holds for the upper right
and lower left blocks and a comparable
calculation verifies it for the lower
right blocks of the two. So summarizing,
because this claim holds. >> Because,
actually, we can recover the four blocks
of S times Y from the seven products.
Strauss' algorithm works in the following
way: you compute the seven products, P1
through P7, using seven recursive calls.
Then you just compute the four blocks
using some extra additions and
subtractions as shown in the claim. So
seven recursive calls on a number two by
number two matrices, plus. N squared work
to do the necessary additions as we'll see
on the master method lecture that is
actually sufficient for. Sub humid time.
Now I sympathize with you if you have the
following question. Which is how on Earth
did Strauson come up with this? And
indeed, this sort of illustrates, the
difference between checking somebody's
proof, and coming up with a proof. Given
that I told you the magical seven products
and how you, from them, can recover the
four desired blocks of x times y, it's
really just mechanical to see that it
works. It's a totally different story of
how you come up with p1 through p7 in the
first place. So how did Strassen come up
with them? Honestly, your guess is as good
as mine.