One wonders if the DFT can be computed faster: Does another
computational procedure -- an algorithm -- exist
that can compute the same quantity, but more efficiently. We
could seek methods that reduce the constant of proportionality,
but do not change the DFT's complexity
ON2
O
N
2
.
Here, we have
something more dramatic in mind: Can the computations be restructured
so that a smaller complexity results?
In 1965, IBM researcher Jim Cooley and Princeton faculty member
John Tukey developed what is now known as the Fast Fourier
Transform (FFT). It is an algorithm for computing that DFT that
has order
ONlogN
O
N
N
for certain length inputs. Now when the length
of data doubles, the spectral computational time will not quadruple as
with the DFT algorithm; instead, it approximately doubles. Later
research showed that no algorithm for computing the DFT could have a
smaller complexity than the FFT. Surprisingly, historical work has
shown that Gauss
in the early nineteenth century developed the same
algorithm, but did not publish it! After the FFT's rediscovery,
not only was the computation of a signal's spectrum greatly
speeded, but also the added feature of algorithm
meant that computations had flexibility not available to analog
implementations.
Before developing the FFT, let's try to appreciate the
algorithm's impact. Suppose a short-length transform takes
1 ms. We want to calculate a transform of a signal that is
10 times longer. Compare how much longer a straightforward
implementation of the DFT would take in comparison to an
FFT, both of which compute exactly the same quantity.
If a DFT required 1ms to compute, and signal having ten
times the duration would require 100ms to compute. Using the
FFT, a 1ms computing time would increase by a factor of
about
10log210=33
10
2
10
33
, a factor of 3 less than the DFT would have
needed.
To derive the FFT, we assume that the signal's duration is a
power of two:
N=2L
N
2
L
.
Consider what happens to the even-numbered and odd-numbered
elements of the sequence in the DFT calculation.
Sk=s0+s2e(−i)2π2kN+…+sN−2e(−i)2π(N−2)kN+s1e(−i)2πkN+s3e(−i)2π×(2+1)kN+…+sN−1e(−i)2π(N−(2−1))kN=[s0+s2e(−i)2πkN2+…+sN−2e(−i)2π(N2−1)kN2]+[s1+s3e(−i)2πkN2+…+sN−1e(−i)2π(N2−1)kN2]e−(i2πk)N
S
k
s
0
s
2
2
2
k
N
…
s
N
2
2
N
2
k
N
s
1
2
k
N
s
3
2
2
1
k
N
…
s
N
1
2
N
2
1
k
N
[
s
0
s
2
2
k
N
2
…
s
N
2
2
N
2
1
k
N
2
]
[
s
1
s
3
2
k
N
2
…
s
N
1
2
N
2
1
k
N
2
]
2
k
N
(1)Each term in square brackets has the form of a
N2
N
2
-length DFT. The first one is a DFT of the
even-numbered elements, and the second of the odd-numbered
elements. The first DFT is combined with the second multiplied
by the complex exponential
e−i2πkN
2
k
N
. The half-length transforms are each evaluated at
frequency indices
k=0k0,
……,
N−1N1.
Normally, the number of frequency indices in a DFT calculation
range between zero and the transform length minus one. The
computational advantage of the FFT comes from
recognizing the periodic nature of the discrete Fourier
transform. The FFT simply reuses the computations made in the
half-length transforms and combines them through additions and
the multiplication by
e−i2πkN
2
k
N
, which is not periodic over
N2
N
2
.
Figure 1 illustrates this decomposition.
As it stands, we now compute two length-
N2
N
2
transforms (complexity
2ON24
2
O
N
2
4
), multiply one of them by the complex exponential
(complexity
ON
O
N
), and add the results (complexity
ON
O
N
). At this point, the total complexity is still
dominated by the half-length DFT calculations, but the
proportionality coefficient has been reduced.
Now for the fun. Because
N=2L
N
2
L
, each of the half-length transforms can be reduced to
two quarter-length transforms, each of these to two
eighth-length ones, etc. This decomposition continues until we
are left with length-2 transforms. This transform is quite
simple, involving only additions. Thus, the first stage of the
FFT has
N2
N
2
length-2 transforms (see the bottom part of Figure 1). Pairs of these transforms are
combined by adding one to the other multiplied by a complex
exponential. Each pair requires 4 additions and 2
multiplications, giving a total number of computations equaling
6·N4=3N2
6
·
N
4
3N
2
.
This number of computations does not change from stage to stage.
Because the number of stages, the number of times the length can
be divided by two, equals
log2N
2
N
, the number of arithmetic operations equals
3N2log2N
3N
2
2N
, which makes the complexity of the FFT
ONlog2N
O
N
2
N
.
Doing an example will make
computational savings more obvious. Let's look at the details
of a length-8 DFT. As shown on Figure 2, we first decompose the DFT into two length-4
DFTs, with the outputs added and subtracted together in pairs.
Considering Figure 2 as the
frequency index goes from 0 through 7, we recycle values from
the length-4 DFTs into the final calculation because of the
periodicity of the DFT output. Examining how pairs of outputs
are collected together, we create the basic computational
element known as a butterfly (Figure 2).
By considering together the computations involving common output
frequencies from the two half-length DFTs, we see that the two
complex multiplies are related to each other, and we can reduce
our computational work even further. By further decomposing the
length-4 DFTs into two length-2 DFTs and combining their
outputs, we arrive at the diagram summarizing the length-8 fast
Fourier transform (
Figure 1).
Although most of the complex multiplies are quite simple
(multiplying by
e−(iπ2)
2
means swapping real and imaginary parts and changing their signs), let's count those for
purposes of evaluating the complexity as full complex
multiplies. We have
N2=4
N
2
4
complex multiplies and
N=8
N8
complex additions for each stage and
log2N=3
2
N
3
stages, making the number of basic computations
3N2log2N
3
N
2
2
N
as predicted.
Note that the ordering of the input sequence in the two
parts of Figure 1 aren't quite
the same. Why not? How is the ordering determined?
The upper panel has not used the FFT algorithm to compute
the length-4 DFTs while the lower one has. The ordering is
determined by the algorithm.
Other "fast" algorithms were discovered,
all of which make use of how many common factors the transform
length NN has. In number theory,
the number of prime factors a given integer has measures how
composite it is. The numbers 16 and 81 are
highly composite (equaling
24
2
4
and
34
3
4
respectively), the number 18 is less so
(
21·32
2
1
·
3
2
), and 17 not at all (it's prime). In over thirty
years of Fourier transform algorithm development, the original
Cooley-Tukey algorithm is far and away the most frequently
used. It is so computationally efficient that power-of-two
transform lengths are frequently used regardless of what the
actual length of the data.
Suppose the length of the signal were
500500? How would you compute
the spectrum of this signal using the Cooley-Tukey
algorithm? What would the length
NN of the transform be?
The transform can have any greater than
or equal to the actual duration of the signal. We simply
“pad” the signal with zero-valued samples until
a computationally advantageous signal length results. Recall
that the FFT is an algorithm to compute
the DFT.
Extending the length of the signal this way merely means we
are sampling the frequency axis more finely than required.
To use the Cooley-Tukey algorithm, the length of the
resulting zero-padded signal can be 512, 1024, etc. samples
long.
"Electrical Engineering Digital Processing Systems in Braille."