One wonders if the DFT can be computed faster: Does another
      computational procedure -- an algorithm -- exist
      that can compute the same quantity, but more efficiently. We
      could seek methods that reduce the constant of proportionality,
      but do not change the DFT's complexity
      
      
	ON2
      
	  O
	  
	    
	    N
	    2
	  
	.
      
      Here, we have
      something more dramatic in mind: Can the computations be restructured
      so that a smaller complexity results?  
    
    
    
      In 1965, IBM researcher Jim Cooley and Princeton faculty member
      John Tukey developed what is now known as the Fast Fourier
      Transform (FFT). It is an algorithm for computing that DFT that
      has order
      
      
	ONlogN
      
	  O
	  
	    
	    N
	    N
	  
	
      
      for certain length inputs. Now when the length
      of data doubles, the spectral computational time will not quadruple as
      with the DFT algorithm; instead, it approximately doubles. Later
      research showed that no algorithm for computing the DFT could have a
      smaller complexity than the FFT. Surprisingly, historical work has
      shown that Gauss 
      in the early nineteenth century developed the same 
      algorithm, but did not publish it! After the FFT's rediscovery,
      not only was the computation of a signal's spectrum greatly
      speeded, but also the added feature of algorithm
      meant that computations had flexibility not available to analog
      implementations.  
    
    
    
    
	
	  Before developing the FFT, let's try to appreciate the
	  algorithm's impact.  Suppose a short-length transform takes
	  1 ms. We want to calculate a transform of a signal that is
	  10 times longer. Compare how much longer a straightforward
	  implementation of the DFT would take in comparison to an
	  FFT, both of which compute exactly the same quantity.
       If a DFT required 1ms to compute, and signal having ten
	  times the duration would require 100ms to compute. Using the
	  FFT, a 1ms computing time would increase by a factor of
	  about
	  
	    10log210=33
	  
	      
                10
                
		 2
		 10
	        
              
	      33
	    , a factor of 3 less than the DFT would have
	  needed.
 
    
    
      To derive the FFT, we assume that the signal's duration is a
      power of two:
      
	N=2L
      
	  
	  N
	  
	    
	    2
	    L
	  
	.  
      Consider what happens to the even-numbered and odd-numbered
      elements of the sequence in the DFT calculation. 
    
     
    
    
	Sk=s0+s2e(−i)2π2kN+…+sN−2e(−i)2π(N−2)kN+s1e(−i)2πkN+s3e(−i)2π×(2+1)kN+…+sN−1e(−i)2π(N−(2−1))kN=[s0+s2e(−i)2πkN2+…+sN−2e(−i)2π(N2−1)kN2]+[s1+s3e(−i)2πkN2+…+sN−1e(−i)2π(N2−1)kN2]e−(i2πk)N
      
	  
	  
	    S
	    k
	  
	  
	    
	    
	      s
	      0
	    
	    
	      
	      
		s
		2
	      
	      
		
		
		  
		  
		    
		    
                  
		  
		    
		    
		      
		      2
		      
		      2
		      k
		    
		    N
		  
                
	      
	    
	    …
	    
	      
	      
		s
		
		  
		  N
		  2
		
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      
			
			N
			2
		      
		      k
		    
		    N
		  
		
	      
	    
	    
	      
	      
		s
		1
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      k
		    
		    N
		  
		
	      
	    
	    
	      
	      
		s
		3
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      
			
			2
			1
		      
		      k
		    
		    N
		  
		
	      
	    
	    …
	    
	      
	      
		s
		
		  
		  N
		  1
		
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      
			
			N
			
			  
			  2
			  1
			
		      
		      k
		    
		    N
		  
		
	      
	    
	   
	  
	    
	    
	      
	      [
	      
		s
		0
	      
	    
	    
	      
	      
		s
		2
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      k
		    
		    
		      
		      N
		      2
		    
		  
		
	      
	    
	    …
	    
	      
	      
		s
		
		  
		  N
		  2
		
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      
			
			
			  
			  N
			  2
			
			1
		      
		      k
		    
		    
		      
		      N
		      2
		    
		  
		
	      
	      ]
	    
	    
	      
	      [
	      
		s
		1
	      
	    
	    
	      
	      
		s
		3
	      
	      
		
		  
		  
		    
                      2
		      
		      k
                    
		    
		      
		      N
		      2
		    
		  
		
	      
	    
	    …
	    
	      
	      
		s
		
		  
		  N
		  1
		
	      
	      
		
		
		  
		  
		    
		    
		  
		  
		    
		    
		      
		      2
		      
		      
			
			
			  
			  N
			  2
			
			1
		      
		      k
		    
		    
		      
		      N
		      2
		    
		  
		
	      
	      ]
	      
		
		
		  
		  
		    
		     
		      
		      
		      2
		      
			k
		    
		  
		  N
		
	      
	    
	  
	
    
(1)Each term in square brackets has the form of a 
      
      
	N2
      
	  
	  N
	  2
	-length DFT.  The first one is a DFT of the
      even-numbered elements, and the second of the odd-numbered
      elements.  The first DFT is combined with the second multiplied
      by the complex exponential
      
      
	e−i2πkN
      
	  
	  
	    
	    
	      
	       
		
		
		2
		
		k
	      
	    N
	    
	  
	.  The half-length transforms are each evaluated at
      frequency indices 
      k=0k0,
      ……,
      N−1N1.
      Normally, the number of frequency indices in a DFT calculation
      range between zero and the transform length minus one.  The
      computational advantage of the FFT comes from
      recognizing the periodic nature of the discrete Fourier
      transform.  The FFT simply reuses the computations made in the
      half-length transforms and combines them through additions and
      the multiplication by
      
      
	e−i2πkN
      
	  
	  
	    
	    
	      
	       
		
		
		2
		
		k
	      
	    N
	  
	  
	, which is not periodic over 
      
      
	N2
      
	  
	  N
	  2
	.
      Figure 1 illustrates this decomposition.
      As it stands, we now compute two length-
	N2
      
	  
	  N
	  2
	
      transforms (complexity 
      
      
	2ON24
      
	  
	  2
	  
	    O
	    
	      
	      
		
		N
		2
	      
	      4
	    
	  
	), multiply one of them by the complex exponential
      (complexity
      
      
	ON
      
	  O
	  N
	), and add the results (complexity 
      
      
	ON
      
	  O
	  N
	).  At this point, the total complexity is still
    dominated by the half-length DFT calculations, but the
    proportionality coefficient has been reduced.  
    
    Now for the fun.  Because 
      
	N=2L
      
	  
	  N
	  
	    
	    2
	    L
	  
	, each of the half-length transforms can be reduced to
      two quarter-length transforms, each of these to two
      eighth-length ones, etc.  This decomposition continues until we
      are left with length-2 transforms.  This transform is quite
      simple, involving only additions.  Thus, the first stage of the
      FFT has
      
	N2
      
	  
	  N
	  2
	 
      
      length-2 transforms (see the bottom part of Figure 1).  Pairs of these transforms are
      combined by adding one to the other multiplied by a complex
      exponential.  Each pair requires 4 additions and 2
      multiplications, giving a total number of computations equaling
      
      
	6·N4=3N2
      
	  
	  
	    
	    6
	    ·
	    
	      
	      N
	      4
	    
	  
	  
	    3N
	    2
	  
	.  
      This number of computations does not change from stage to stage.
      Because the number of stages, the number of times the length can
      be divided by two, equals
      
      
	log2N
      
	  2
	  N
	, the number of arithmetic operations equals
      
        3N2log2N
      
          
            3N
            2
          
	  2N
        , which makes the complexity of the FFT
      
	ONlog2N
      
	  O
	  
	    
	    N
	    
	      2
	      N
	    
	  
	.
    
    
    
    
    Doing an example will make
      computational savings more obvious.  Let's look at the details
      of a length-8 DFT.  As shown on Figure 2, we first decompose the DFT into two length-4
      DFTs, with the outputs added and subtracted together in pairs.
      Considering Figure 2 as the
      frequency index goes from 0 through 7, we recycle values from
      the length-4 DFTs into the final calculation because of the
      periodicity of the DFT output. Examining how pairs of outputs
      are collected together, we create the basic computational
      element known as a butterfly (Figure 2).
    
      By considering together the computations involving common output
      frequencies from the two half-length DFTs, we see that the two
      complex multiplies are related to each other, and we can reduce
      our computational work even further. By further decomposing the
      length-4 DFTs into two length-2 DFTs and combining their
      outputs, we arrive at the diagram summarizing the length-8 fast
      Fourier transform (
Figure 1).
      Although most of the complex multiplies are quite simple
      (multiplying by
      
      
	e−(iπ2)
      
	  
	  
	    
	    
	      
	      
	      2
	    
	  
	
      
      means swapping real and imaginary parts and changing their signs), let's count those for
      purposes of evaluating the complexity as full complex
      multiplies.  We have
      
      
	N2=4
      
	  
	  
	    
	    N
	    2
	  
	  4
	
      
      complex multiplies and
      
      
	N=8
      N8
      
      complex additions for each stage and
      
      
	log2N=3
      
	  
	  
	    
	    2
	    N
	  
	  3
	
      
      stages, making the number of basic computations 
      
      
	3N2log2N
      
	  
	  
	    
	    
	      
	      3
	      N
	    
	    2
	  
	  
	    
	    2
	    N
	  
	 
      as predicted. 
    
    
 
	
	  Note that the ordering of the input sequence in the two
	  parts of Figure 1 aren't quite
	  the same.  Why not?  How is the ordering determined?
	
       
	  The upper panel has not used the FFT algorithm to compute
	  the length-4 DFTs while the lower one has. The ordering is
	  determined by the algorithm.
	
 
    
     Other "fast" algorithms were discovered,
      all of which make use of how many common factors the transform
      length NN has.  In number theory,
      the number of prime factors a given integer has measures how
      composite it is.  The numbers 16 and 81 are
      highly composite (equaling
      
      
	24
      
	  
	  2
	  4
	 
      
      and 
      
	34
      
	  
	  3
	  4
	 
      
      respectively), the number 18 is less so
      (
	21·32
      
	  
	  
	    
	    2
	    1
	  
	  ·
	  
	    
	    3
	    2
	  
	), and 17 not at all (it's prime).  In over thirty
      years of Fourier transform algorithm development, the original
      Cooley-Tukey algorithm is far and away the most frequently
      used. It is so computationally efficient that power-of-two
      transform lengths are frequently used regardless of what the
      actual length of the data.
    
     
	
	  Suppose the length of the signal were
          500500?  How would you compute
          the spectrum of this signal using the Cooley-Tukey
          algorithm?  What would the length
          NN of the transform be?
	
       
	  The transform can have any greater than
          or equal to the actual duration of the signal.  We simply
          “pad” the signal with zero-valued samples until
          a computationally advantageous signal length results. Recall
          that the FFT is an algorithm to compute
          the DFT.
          Extending the length of the signal this way merely means we
          are sampling the frequency axis more finely than required.
          To use the Cooley-Tukey algorithm, the length of the
          resulting zero-padded signal can be 512, 1024, etc. samples
          long.
	
 
   
        
"Electrical Engineering Digital Processing Systems in Braille."