Cooley-Tukey FFT

Seun Lanlege — Mad scientist

Cooley-Tukey FFT

In a previous article i briefly touched on FFTs. I mentioned that they are a faster way to perform polynomial interpolation. But FFTs are used for more than just polynomials. Specifically they are used for computing DFTs. A DFT is a system of linear equations that are related whose basis lies in Cn\mathbb{C}^n. DFTs have other applications in signal processing.

In order to motivate the need for the FFT, Let’s examine a 12-degree polynomial in whose basis lies in the real.

P(x0)=a0+a1x0+a2x02+a3x03+a4x04+a5x05+a6x06+a7x07+a8x08+a9x09+a10x010+a11x011P(x1)=a0+a1x1+a2x12+a3x13+a4x14+a5x15+a6x16+a7x17+a8x18+a9x19+a10x110+a11x111P(x2)=a0+a1x2+a2x22+a3x23+a4x24+a5x25+a6x26+a7x27+a8x28+a9x29+a10x210+a11x211P(x3)=a0+a1x3+a2x32+a3x33+a4x34+a5x35+a6x36+a7x37+a8x38+a9x39+a10x310+a11x311P(x4)=a0+a1x4+a2x42+a3x43+a4x44+a5x45+a6x46+a7x47+a8x48+a9x49+a10x410+a11x411P(x5)=a0+a1x5+a2x52+a3x53+a4x54+a5x55+a6x56+a7x57+a8x58+a9x59+a10x510+a11x511P(x6)=a0+a1x6+a2x62+a3x63+a4x64+a5x65+a6x66+a7x67+a8x68+a9x69+a10x610+a11x611P(x7)=a0+a1x7+a2x72+a3x73+a4x74+a5x75+a6x76+a7x77+a8x78+a9x79+a10x710+a11x711P(x8)=a0+a1x8+a2x82+a3x83+a4x84+a5x85+a6x86+a7x87+a8x88+a9x89+a10x810+a11x811P(x9)=a0+a1x9+a2x92+a3x93+a4x94+a5x95+a6x96+a7x97+a8x98+a9x99+a10x910+a11x911P(x10)=a0+a1x10+a2x102+a3x103+a4x104+a5x105+a6x106+a7x107+a8x108+a9x109+a10x1010+a11x1011P(x11)=a0+a1x11+a2x112+a3x113+a4x114+a5x115+a6x116+a7x117+a8x118+a9x119+a10x1110+a11x1111\begin{aligned} P(x_0) &= a_0 + a_1x_0 + a_2x_0^2 + a_3x_0^3 + a_4x_0^4 + a_5x_0^5 + a_6x_0^6 + a_7x_0^7 + a_8x_0^8 + a_9x_0^9 + a_{10}x_0^{10} + a_{11}x_0^{11} \\ P(x_1) &= a_0 + a_1x_1 + a_2x_1^2 + a_3x_1^3 + a_4x_1^4 + a_5x_1^5 + a_6x_1^6 + a_7x_1^7 + a_8x_1^8 + a_9x_1^9 + a_{10}x_1^{10} + a_{11}x_1^{11} \\ P(x_2) &= a_0 + a_1x_2 + a_2x_2^2 + a_3x_2^3 + a_4x_2^4 + a_5x_2^5 + a_6x_2^6 + a_7x_2^7 + a_8x_2^8 + a_9x_2^9 + a_{10}x_2^{10} + a_{11}x_2^{11} \\ P(x_3) &= a_0 + a_1x_3 + a_2x_3^2 + a_3x_3^3 + a_4x_3^4 + a_5x_3^5 + a_6x_3^6 + a_7x_3^7 + a_8x_3^8 + a_9x_3^9 + a_{10}x_3^{10} + a_{11}x_3^{11} \\ P(x_4) &= a_0 + a_1x_4 + a_2x_4^2 + a_3x_4^3 + a_4x_4^4 + a_5x_4^5 + a_6x_4^6 + a_7x_4^7 + a_8x_4^8 + a_9x_4^9 + a_{10}x_4^{10} + a_{11}x_4^{11} \\ P(x_5) &= a_0 + a_1x_5 + a_2x_5^2 + a_3x_5^3 + a_4x_5^4 + a_5x_5^5 + a_6x_5^6 + a_7x_5^7 + a_8x_5^8 + a_9x_5^9 + a_{10}x_5^{10} + a_{11}x_5^{11} \\ P(x_6) &= a_0 + a_1x_6 + a_2x_6^2 + a_3x_6^3 + a_4x_6^4 + a_5x_6^5 + a_6x_6^6 + a_7x_6^7 + a_8x_6^8 + a_9x_6^9 + a_{10}x_6^{10} + a_{11}x_6^{11} \\ P(x_7) &= a_0 + a_1x_7 + a_2x_7^2 + a_3x_7^3 + a_4x_7^4 + a_5x_7^5 + a_6x_7^6 + a_7x_7^7 + a_8x_7^8 + a_9x_7^9 + a_{10}x_7^{10} + a_{11}x_7^{11} \\ P(x_8) &= a_0 + a_1x_8 + a_2x_8^2 + a_3x_8^3 + a_4x_8^4 + a_5x_8^5 + a_6x_8^6 + a_7x_8^7 + a_8x_8^8 + a_9x_8^9 + a_{10}x_8^{10} + a_{11}x_8^{11} \\ P(x_9) &= a_0 + a_1x_9 + a_2x_9^2 + a_3x_9^3 + a_4x_9^4 + a_5x_9^5 + a_6x_9^6 + a_7x_9^7 + a_8x_9^8 + a_9x_9^9 + a_{10}x_9^{10} + a_{11}x_9^{11} \\ P(x_{10}) &= a_0 + a_1x_{10} + a_2x_{10}^2 + a_3x_{10}^3 + a_4x_{10}^4 + a_5x_{10}^5 + a_6x_{10}^6 + a_7x_{10}^7 + a_8x_{10}^8 + a_9x_{10}^9 + a_{10}x_{10}^{10} + a_{11}x_{10}^{11} \\ P(x_{11}) &= a_0 + a_1x_{11} + a_2x_{11}^2 + a_3x_{11}^3 + a_4x_{11}^4 + a_5x_{11}^5 + a_6x_{11}^6 + a_7x_{11}^7 + a_8x_{11}^8 + a_9x_{11}^9 + a_{10}x_{11}^{10} + a_{11}x_{11}^{11} \\ \end{aligned}

It’s clear from writing out this expression that the complexity of calculating each P(xi)P(x_i) for x{0,,n1}x \in \{0, \dots, n-1 \} is O(N2)O(N^2). Another way to look at this system of equations is through the lens of matrix multiplication:

[1x0x02x03x04x05x06x07x08x09x010x0111x1x12x13x14x15x16x17x18x19x110x1111x2x22x23x24x25x26x27x28x29x210x2111x3x32x33x34x35x36x37x38x39x310x3111x4x42x43x44x45x46x47x48x49x410x4111x5x52x53x54x55x56x57x58x59x510x5111x6x62x63x64x65x66x67x68x69x610x6111x7x72x73x74x75x76x77x78x79x710x7111x8x82x83x84x85x86x87x88x89x810x8111x9x92x93x94x95x96x97x98x99x910x9111x10x102x103x104x105x106x107x108x109x1010x10111x11x112x113x114x115x116x117x118x119x1110x1111][a0a1a2a3a4a5a6a7a8a9a10a11]=[P(x0)P(x1)P(x2)P(x3)P(x4)P(x5)P(x6)P(x7)P(x8)P(x9)P(x10)P(x11)]\begin{bmatrix} 1 & x_0 & x_0^2 & x_0^3 & x_0^4 & x_0^5 & x_0^6 & x_0^7 & x_0^8 & x_0^9 & x_0^{10} & x_0^{11} \\ 1 & x_1 & x_1^2 & x_1^3 & x_1^4 & x_1^5 & x_1^6 & x_1^7 & x_1^8 & x_1^9 & x_1^{10} & x_1^{11} \\ 1 & x_2 & x_2^2 & x_2^3 & x_2^4 & x_2^5 & x_2^6 & x_2^7 & x_2^8 & x_2^9 & x_2^{10} & x_2^{11} \\ 1 & x_3 & x_3^2 & x_3^3 & x_3^4 & x_3^5 & x_3^6 & x_3^7 & x_3^8 & x_3^9 & x_3^{10} & x_3^{11} \\ 1 & x_4 & x_4^2 & x_4^3 & x_4^4 & x_4^5 & x_4^6 & x_4^7 & x_4^8 & x_4^9 & x_4^{10} & x_4^{11} \\ 1 & x_5 & x_5^2 & x_5^3 & x_5^4 & x_5^5 & x_5^6 & x_5^7 & x_5^8 & x_5^9 & x_5^{10} & x_5^{11} \\ 1 & x_6 & x_6^2 & x_6^3 & x_6^4 & x_6^5 & x_6^6 & x_6^7 & x_6^8 & x_6^9 & x_6^{10} & x_6^{11} \\ 1 & x_7 & x_7^2 & x_7^3 & x_7^4 & x_7^5 & x_7^6 & x_7^7 & x_7^8 & x_7^9 & x_7^{10} & x_7^{11} \\ 1 & x_8 & x_8^2 & x_8^3 & x_8^4 & x_8^5 & x_8^6 & x_8^7 & x_8^8 & x_8^9 & x_8^{10} & x_8^{11} \\ 1 & x_9 & x_9^2 & x_9^3 & x_9^4 & x_9^5 & x_9^6 & x_9^7 & x_9^8 & x_9^9 & x_9^{10} & x_9^{11} \\ 1 & x_{10} & x_{10}^2 & x_{10}^3 & x_{10}^4 & x_{10}^5 & x_{10}^6 & x_{10}^7 & x_{10}^8 & x_{10}^9 & x_{10}^{10} & x_{10}^{11} \\ 1 & x_{11} & x_{11}^2 & x_{11}^3 & x_{11}^4 & x_{11}^5 & x_{11}^6 & x_{11}^7 & x_{11}^8 & x_{11}^9 & x_{11}^{10} & x_{11}^{11} \\ \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \\ a_6 \\ a_7 \\ a_8 \\ a_9 \\ a_{10} \\ a_{11} \\ \end{bmatrix} = \begin{bmatrix} P(x_0) \\ P(x_1) \\ P(x_2) \\ P(x_3) \\ P(x_4) \\ P(x_5) \\ P(x_6) \\ P(x_7) \\ P(x_8) \\ P(x_9) \\ P(x_{10}) \\ P(x_{11}) \\ \end{bmatrix}

This matrix is also known as the vandermonde matrix. The vandermonde matrix is invertible, since it is a square matrix with distinct elements. This equation can therefore be re-arranged allowing us to derive the coefficients aia_is of the polynomial given the evaluations P(xi)P(x_i).

a=V1Pa = V^{-1}P

This is an alternative algebraic form of polynomial interpolation, other forms include Lagrange polynomials, Newton’s polynomials etc. Unfortunately, performing either operations, whether polynomial evaluation or interpolation leads to a complexity of O(N2)O(N^2).

Discrete Fourier Transform

The Fast Fourier Transform provides a way to transform the complexity of computing a DFT of size NN for which the naive approach takes O(N2)O(N^2) complexity into something quasilinear. In order for this to happen, we need to perform a change of basis, i.e rewrite the coordinates of our polynomial to be elements of a cyclic group. Such that they are indexed by successive powers of the generator of this cyclic group. Thus we get the DFT

X(j)=k=0N1A(k)ωjk,j=0,1,N1.\begin{equation} X(j) = \sum_{k = 0}^{N-1} A(k)\cdot \omega^{jk}, \quad j = 0,1,\dots N-1. \tag{1} \end{equation}

where ω=e2πi/N\omega = e^{2\pi i/N}, such that ωN=1\omega^N = 1. Let’s look at this in matrix form for N=12N = 12.

[ω0ω0ω0ω0ω0ω0ω0ω0ω0ω0ω0ω0ω0ω1ω2ω3ω4ω5ω6ω7ω8ω9ω10ω11ω0ω2ω4ω6ω8ω10ω12ω14ω16ω18ω20ω22ω0ω3ω6ω9ω12ω15ω18ω21ω24ω27ω30ω33ω0ω4ω8ω12ω16ω20ω24ω28ω32ω36ω40ω44ω0ω5ω10ω15ω20ω25ω30ω35ω40ω45ω50ω55ω0ω6ω12ω18ω24ω30ω36ω42ω48ω54ω60ω66ω0ω7ω14ω21ω28ω35ω42ω49ω56ω63ω70ω77ω0ω8ω16ω24ω32ω40ω48ω56ω64ω72ω80ω88ω0ω9ω18ω27ω36ω45ω54ω63ω72ω81ω90ω99ω0ω10ω20ω30ω40ω50ω60ω70ω80ω90ω100ω110ω0ω11ω22ω33ω44ω55ω66ω77ω88ω99ω110ω121][a0a1a2a3a4a5a6a7a8a9a10a11]=[P(x0)P(x1)P(x2)P(x3)P(x4)P(x5)P(x6)P(x7)P(x8)P(x9)P(x10)P(x11)]\begin{bmatrix}\omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 & \omega^0 \\\omega^0 & \omega^1 & \omega^2 & \omega^3 & \omega^4 & \omega^5 & \omega^6 & \omega^7 & \omega^8 & \omega^9 & \omega^{10} & \omega^{11} \\\omega^0 & \omega^2 & \omega^4 & \omega^6 & \omega^{8} & \omega^{10} & \omega^{12} & \omega^{14} & \omega^{16} & \omega^{18} & \omega^{20} & \omega^{22} \\\omega^{0} & \omega^{3} & \omega^{6} & \omega^{9} & \omega^{12} & \omega^{15} & \omega^{18} & \omega^{21} & \omega^{24} & \omega^{27} & \omega^{30} & \omega^{33} \\\omega^{0} & \omega^{4} & \omega^{8} & \omega^{12} & \omega^{16} & \omega^{20} & \omega^{24} & \omega^{28} & \omega^{32} & \omega^{36} & \omega^{40} & \omega^{44} \\\omega^{0} & \omega^{5} & \omega^{10} & \omega^{15} & \omega^{20} & \omega^{25} & \omega^{30} & \omega^{35} & \omega^{40} & \omega^{45} & \omega^{50} & \omega^{55} \\\omega^{0} & \omega^{6} & \omega^{12} & \omega^{18} & \omega^{24} & \omega^{30} & \omega^{36} & \omega^{42} & \omega^{48} & \omega^{54} & \omega^{60} & \omega^{66} \\\omega^{0} & \omega^{7} & \omega^{14} & \omega^{21} & \omega^{28} & \omega^{35} & \omega^{42} & \omega^{49} & \omega^{56} & \omega^{63} & \omega^{70} & \omega^{77} \\\omega^{0} & \omega^{8} & \omega^{16} & \omega^{24} & \omega^{32} & \omega^{40} & \omega^{48} & \omega^{56} & \omega^{64} & \omega^{72} & \omega^{80} & \omega^{88} \\\omega^{0} & \omega^{9} & \omega^{18} & \omega^{27} & \omega^{36} & \omega^{45} & \omega^{54} & \omega^{63} & \omega^{72} & \omega^{81} & \omega^{90} & \omega^{99} \\\omega^{0} & \omega^{10} & \omega^{20} & \omega^{30} & \omega^{40} & \omega^{50} & \omega^{60} & \omega^{70} & \omega^{80} & \omega^{90} & \omega^{100} & \omega^{110} \\\omega^{0} & \omega^{11} & \omega^{22} & \omega^{33} & \omega^{44} & \omega^{55} & \omega^{66} & \omega^{77} & \omega^{88} & \omega^{99} & \omega^{110} & \omega^{121} \\\end{bmatrix}\begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \\ a_6 \\ a_7 \\ a_8 \\ a_9 \\ a_{10} \\ a_{11} \\ \end{bmatrix} = \begin{bmatrix} P(x_0) \\ P(x_1) \\ P(x_2) \\ P(x_3) \\ P(x_4) \\ P(x_5) \\ P(x_6) \\ P(x_7) \\ P(x_8) \\ P(x_9) \\ P(x_{10}) \\ P(x_{11}) \\ \end{bmatrix}

simplifying since ωk=ωkmodN\omega^k = \omega^{k \mod N}

[1111111111111ω1ω2ω3ω4ω5ω6ω7ω8ω9ω10ω111ω2ω4ω6ω8ω101ω2ω4ω6ω8ω101ω3ω6ω91ω3ω6ω91ω3ω6ω91ω4ω81ω4ω81ω4ω81ω4ω81ω5ω10ω3ω8ω1ω6ω11ω4ω9ω2ω71ω61ω61ω61ω61ω61ω61ω7ω2ω9ω4ω11ω6ω1ω8ω3ω10ω51ω8ω41ω8ω41ω8ω41ω8ω41ω9ω6ω31ω9ω6ω31ω9ω6ω31ω10ω8ω6ω4ω21ω10ω8ω6ω4ω21ω11ω10ω9ω8ω7ω6ω5ω4ω3ω2ω1][a0a1a2a3a4a5a6a7a8a9a10a11]=[P(x0)P(x1)P(x2)P(x3)P(x4)P(x5)P(x6)P(x7)P(x8)P(x9)P(x10)P(x11)]\begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\1 & \omega^1 & \omega^2 & \omega^3 & \omega^4 & \omega^5 & \omega^6 & \omega^7 & \omega^8 & \omega^9 & \omega^{10} & \omega^{11} \\1 & \omega^2 & \omega^4 & \omega^6 & \omega^8 & \omega^{10} & 1 & \omega^2 & \omega^4 & \omega^6 & \omega^8 & \omega^{10} \\1 & \omega^3 & \omega^6 & \omega^9 & 1 & \omega^3 & \omega^6 & \omega^9 & 1 & \omega^3 & \omega^6 & \omega^9 \\1 & \omega^4 & \omega^8 & 1 & \omega^4 & \omega^8 & 1 & \omega^4 & \omega^8 & 1 & \omega^4 & \omega^8 \\1 & \omega^5 & \omega^{10} & \omega^3 & \omega^8 & \omega^1 & \omega^6 & \omega^{11} & \omega^4 & \omega^9 & \omega^2 & \omega^7 \\1 & \omega^6 & 1 & \omega^6 & 1 & \omega^6 & 1 & \omega^6 & 1 & \omega^6 & 1 & \omega^6 \\1 & \omega^7 & \omega^2 & \omega^9 & \omega^4 & \omega^{11} & \omega^6 & \omega^1 & \omega^8 & \omega^3 & \omega^{10} & \omega^5 \\1 & \omega^8 & \omega^4 & 1 & \omega^8 & \omega^4 & 1 & \omega^8 & \omega^4 & 1 & \omega^8 & \omega^4 \\1 & \omega^9 & \omega^6 & \omega^3 & 1 & \omega^9 & \omega^6 & \omega^3 & 1 & \omega^9 & \omega^6 & \omega^3 \\1 & \omega^{10} & \omega^8 & \omega^6 & \omega^4 & \omega^2 & 1 & \omega^{10} & \omega^8 & \omega^6 & \omega^4 & \omega^2 \\1 & \omega^{11} & \omega^{10} & \omega^9 & \omega^8 & \omega^7 & \omega^6 & \omega^5 & \omega^4 & \omega^3 & \omega^2 & \omega^1 \\\end{bmatrix}\begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \\ a_6 \\ a_7 \\ a_8 \\ a_9 \\ a_{10} \\ a_{11} \\ \end{bmatrix} = \begin{bmatrix} P(x_0) \\ P(x_1) \\ P(x_2) \\ P(x_3) \\ P(x_4) \\ P(x_5) \\ P(x_6) \\ P(x_7) \\ P(x_8) \\ P(x_9) \\ P(x_{10}) \\ P(x_{11}) \\ \end{bmatrix}

You’ll notice that a lot of terms seems to be repeated seemingly at random, this periodicity is what the Fast Fourier Transform takes advantage of.

Cooley-Tukey

There are a different FFT algorithms depending on the structure of the cyclic group that can be applied, but we’ll focus our attention on the most popular one, the Cooley-Tukey FFT[1]^{[1]}. In their 1968 paper, they propose an algorithm for machines to compute the DFT much faster than ever before. (Although some people would point out that gauss did it first). Their paper exploits the structure of the cyclic group by changing the indexing of the DFT which gives rise to the faster algorithm. Suppose that the size of our DFT is composite such t N=r1r2N = r_1 \cdot r_2.

Then let

j=j1r1+j0,j0=0,1,r11,j1=0,1,r21k=k1r1+k0,k0=0,1,r21,k1=0,1,r11\begin{split} j = j_1r_1 + j_0, \quad j_0 = 0,1,\dots r_1-1, \quad j_1 = 0,1,\dots r_2-1 \\ k = k_1r_1 + k_0, \quad k_0 = 0,1,\dots r_2-1, \quad k_1 = 0,1,\dots r_1-1 \end{split}

Substituting the terms of jj & kk in (1)(1), we get

X(j1r1+j0)=k1r2+k0N1A(k1r2+k0)ω(j1r1+j0)(k1r2+k0)j1r1+j0=0,1,N1.\begin{split} X(j_1r_1 + j_0) = \sum^{N-1}_{k_1r_2+k_0}A(k_1r_2+k_0) \cdot \omega^{(j_1r_1+j_0)(k_1r_2+k_0)} \quad j_1r_1 + j_0 = 0,1,\dots N-1. \end{split}

Since r1r_1 and r2r_2 are constant terms, we can rewrite XX & AA in terms of only it’s variables.

X(j1,j0)=k0=0r21k1=0r11A(k1,k0)ωj1r1k1r2+j1r1k0+j0k1r2+j0k0=k0=0r21k1=0r11A(k1,k0)ωj1r1k1r2ωj1r1k0+j0k1r2+j0k0=k0=0r21k1=0r11A(k1,k0)(ωr1r2)j1k1ωj1r1k0+j0k0+j0k1r2=k0=0r21k1=0r11A(k1,k0)(ωN)j1k1ω(j1r1+j0)k0+j0k1r2=k0=0r21k1=0r11A(k1,k0)(1)j1k1ωjk0+j0k1r2=k0=0r21k1=0r11A(k1,k0)ωjk0+j0k1r2\begin{split} X(j_1,j_0) &= \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot \omega^{j_1r_1k_1r_2 + j_1r_1k_0 + j_0k_1r_2+j_0k_0} \\ &= \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot \omega^{j_1r_1k_1r_2}\cdot \omega^{j_1r_1k_0 + j_0k_1r_2+j_0k_0} \\ &= \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot (\omega^{r_1r_2})^{j_1k_1}\cdot \omega^{j_1r_1k_0 +j_0k_0+ j_0k_1r_2} \\ &= \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot (\omega^{N})^{j_1k_1}\cdot \omega^{(j_1r_1 +j_0)k_0+ j_0k_1r_2} \\ &= \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot (1)^{j_1k_1}\cdot \omega^{jk_0+ j_0k_1r_2} \\ &= \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot \omega^{jk_0+ j_0k_1r_2} \end{split}

Finally we obtain the equation

X(j1,j0)=k0=0r21k1=0r11A(k1,k0)ωjk0ωj0k1r2\begin{equation} X(j_1, j_0) = \sum^{r_2-1}_{k_0 = 0} \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot \omega^{jk_0} \cdot \omega^{j_0k_1r_2} \tag{2} \end{equation}

We can observe that the inner sum over k1k_1 only depends on j0j_0 and k0k_0, we can therefore write:

A1(j0,k0)=k1=0r11A(k1,k0)(ωr2)j0k1\begin{equation} A_1(j_0, k_0) = \sum^{r_1-1}_{k_1 = 0} A(k_1, k_0) \cdot (\omega^{r_2})^{j_0k_1}\tag{3} \end{equation}

So that the DFT in (1)(1) can then be written as:

X(j1,j0)=k0=0r21A1(j0,k0)ω(j1r1+j0)k0=k0=0r21A1(j0,k0)ωj0k0(ωr1)j1k0(4)\begin{split} X(j_1, j_0) &= \sum^{r_2-1}_{k_0 = 0} A_1(j_0, k_0) \cdot \omega^{(j_1r_1+j_0)k_0} \\ &= \sum^{r_2-1}_{k_0 = 0} A_1(j_0, k_0) \cdot \omega^{j_0k_0} \cdot (\omega^{r_1})^{j_1k_0} \tag{4} \\ \end{split}

Let’s work out an example DFT of size N=12N = 12, r1=4r_1 = 4 & r2=3r_2 = 3 using the Cooley-Tukey FFT:

A1(0,0)=A(0,0)ω003+A(1,0)ω013+A(2,0)ω023+A(3,0)ω033A1(0,1)=A(0,1)ω003+A(1,1)ω013+A(2,1)ω023+A(3,1)ω033A1(0,2)=A(0,2)ω003+A(1,2)ω013+A(2,2)ω023+A(3,2)ω033A1(1,0)=A(0,0)ω103+A(1,0)ω113+A(2,0)ω123+A(3,0)ω133A1(1,1)=A(0,1)ω103+A(1,1)ω113+A(2,1)ω123+A(3,1)ω133A1(1,2)=A(0,2)ω103+A(1,2)ω113+A(2,2)ω123+A(3,2)ω133A1(2,0)=A(0,0)ω203+A(1,0)ω213+A(2,0)ω223+A(3,0)ω233A1(2,1)=A(0,1)ω203+A(1,1)ω213+A(2,1)ω223+A(3,1)ω233A1(2,2)=A(0,2)ω203+A(1,2)ω213+A(2,2)ω223+A(3,2)ω233A1(3,0)=A(0,0)ω303+A(1,0)ω313+A(2,0)ω323+A(3,0)ω333A1(3,1)=A(0,1)ω303+A(1,1)ω313+A(2,1)ω323+A(3,1)ω333A1(3,2)=A(0,2)ω303+A(1,2)ω313+A(2,2)ω323+A(3,2)ω333\begin{aligned}A_1(0, 0) &= A(0, 0)\omega^{0\cdot0\cdot3} + A(1, 0)\omega^{0\cdot1\cdot3} + A(2, 0)\omega^{0\cdot2\cdot3} + A(3, 0)\omega^{0\cdot3\cdot3} \\A_1(0, 1) &= A(0, 1)\omega^{0\cdot0\cdot3} + A(1, 1)\omega^{0\cdot1\cdot3} + A(2, 1)\omega^{0\cdot2\cdot3} + A(3, 1)\omega^{0\cdot3\cdot3} \\A_1(0, 2) &= A(0, 2)\omega^{0\cdot0\cdot3} + A(1, 2)\omega^{0\cdot1\cdot3} + A(2, 2)\omega^{0\cdot2\cdot3} + A(3, 2)\omega^{0\cdot3\cdot3} \\ A_1(1, 0) &= A(0, 0)\omega^{1\cdot0\cdot3} + A(1, 0)\omega^{1\cdot1\cdot3} + A(2, 0)\omega^{1\cdot2\cdot3} + A(3, 0)\omega^{1\cdot3\cdot3} \\A_1(1, 1) &= A(0, 1)\omega^{1\cdot0\cdot3} + A(1, 1)\omega^{1\cdot1\cdot3} + A(2, 1)\omega^{1\cdot2\cdot3} + A(3, 1)\omega^{1\cdot3\cdot3} \\A_1(1, 2) &= A(0, 2)\omega^{1\cdot0\cdot3} + A(1, 2)\omega^{1\cdot1\cdot3} + A(2, 2)\omega^{1\cdot2\cdot3} + A(3, 2)\omega^{1\cdot3\cdot3} \\ A_1(2, 0) &= A(0, 0)\omega^{2\cdot0\cdot3} + A(1, 0)\omega^{2\cdot1\cdot3} + A(2, 0)\omega^{2\cdot2\cdot3} + A(3, 0)\omega^{2\cdot3\cdot3} \\A_1(2, 1) &= A(0, 1)\omega^{2\cdot0\cdot3} + A(1, 1)\omega^{2\cdot1\cdot3} + A(2, 1)\omega^{2\cdot2\cdot3} + A(3, 1)\omega^{2\cdot3\cdot3} \\A_1(2, 2) &= A(0, 2)\omega^{2\cdot0\cdot3} + A(1, 2)\omega^{2\cdot1\cdot3} + A(2, 2)\omega^{2\cdot2\cdot3} + A(3, 2)\omega^{2\cdot3\cdot3} \\A_1(3, 0) &= A(0, 0)\omega^{3\cdot0\cdot3} + A(1, 0)\omega^{3\cdot1\cdot3} + A(2, 0)\omega^{3\cdot2\cdot3} + A(3, 0)\omega^{3\cdot3\cdot3} \\ A_1(3, 1) &= A(0, 1)\omega^{3\cdot0\cdot3} + A(1, 1)\omega^{3\cdot1\cdot3} + A(2, 1)\omega^{3\cdot2\cdot3} + A(3, 1)\omega^{3\cdot3\cdot3} \\A_1(3, 2) &= A(0, 2)\omega^{3\cdot0\cdot3} + A(1, 2)\omega^{3\cdot1\cdot3} + A(2, 2)\omega^{3\cdot2\cdot3} + A(3, 2)\omega^{3\cdot3\cdot3} \\\end{aligned}

Simplifying

A1(0)=A(0)ω0+A(3)ω0+A(6)ω0+A(9)ω0A1(1)=A(1)ω0+A(4)ω0+A(7)ω0+A(10)ω0A1(2)=A(2)ω0+A(5)ω0+A(8)ω0+A(11)ω0A1(3)=A(0)ω0+A(3)ω3+A(6)ω6+A(9)ω9A1(4)=A(1)ω0+A(4)ω3+A(7)ω6+A(10)ω9A1(5)=A(2)ω0+A(5)ω3+A(8)ω6+A(11)ω9A1(6)=A(0)ω0+A(3)ω6+A(6)ω12+A(9)ω18A1(7)=A(1)ω0+A(4)ω6+A(7)ω12+A(10)ω18A1(8)=A(2)ω0+A(5)ω6+A(8)ω12+A(11)ω18A1(9)=A(0)ω0+A(3)ω9+A(6)ω18+A(9)ω27A1(10)=A(1)ω0+A(4)ω9+A(7)ω18+A(10)ω27A1(11)=A(2)ω0+A(5)ω9+A(8)ω18+A(11)ω27\begin{aligned}A_1(0) &= A(0)\omega^{0} + A(3)\omega^{0} + A(6)\omega^{0} + A(9)\omega^{0} \\A_1(1) &= A(1)\omega^{0} + A(4)\omega^{0} + A(7)\omega^{0} + A(10)\omega^{0} \\A_1(2) &= A(2)\omega^{0} + A(5)\omega^{0} + A(8)\omega^{0} + A(11)\omega^{0} \\ A_1(3) &= A(0)\omega^{0} + A(3)\omega^{3} + A(6)\omega^{6} + A(9)\omega^{9} \\A_1(4) &= A(1)\omega^{0} + A(4)\omega^{3} + A(7)\omega^{6} + A(10)\omega^{9} \\A_1(5) &= A(2)\omega^{0} + A(5)\omega^{3} + A(8)\omega^{6} + A(11)\omega^{9} \\ A_1(6) &= A(0)\omega^{0} + A(3)\omega^{6} + A(6)\omega^{12} + A(9)\omega^{18} \\A_1(7) &= A(1)\omega^{0} + A(4)\omega^{6} + A(7)\omega^{12} + A(10)\omega^{18} \\A_1(8) &= A(2)\omega^{0} + A(5)\omega^{6} + A(8)\omega^{12} + A(11)\omega^{18} \\A_1(9) &= A(0)\omega^{0} + A(3)\omega^{9} + A(6)\omega^{18} + A(9)\omega^{27} \\ A_1(10) &= A(1)\omega^{0} + A(4)\omega^{9} + A(7)\omega^{18} + A(10)\omega^{27} \\A_1(11) &= A(2)\omega^{0} + A(5)\omega^{9} + A(8)\omega^{18} + A(11)\omega^{27} \\\end{aligned}

The keen eye will recognise that since A(k1,k0)A(k_1, k_0) & A1(j0,k0)A_1(j_0, k_0) now have 2 indices, they are basically matrices and the above equations can be seen through the lens of matrix multiplication:

[ω0ω0ω0ω0ω0ω3ω6ω9ω0ω6ω12ω18ω0ω9ω18ω27](ωr2)j0k1[A(0)A(1)A(2)A(3)A(4)A(5)A(6)A(7)A(8)A(9)A(10)A(11)]A(k1,k0)=[A1(0,0)A1(0,1)A1(0,2)A1(1,0)A1(1,1)A1(1,2)A1(2,0)A1(2,1)A1(2,3)A1(3,0)A1(3,1)A1(3,2)]A1(j0,k0)\underbrace{\begin{bmatrix}\omega^0 & \omega^0 & \omega^0 & \omega^0 \\\omega^0 & \omega^3 & \omega^6 & \omega^9 \\\omega^0 & \omega^6 & \omega^{12} & \omega^{18} \\\omega^0 & \omega^9 & \omega^{18} & \omega^{27} \\\end{bmatrix}}_{(\omega^{r_2})^{j_0k_1}} \underbrace{\begin{bmatrix} A(0) & A(1) & A(2)\\ A(3) & A(4) & A(5)\\ A(6) & A(7) & A(8)\\ A(9) & A(10) & A( 11)\\\end{bmatrix}}_{A(k_1, k_0)} = \underbrace{\begin{bmatrix}A_1(0, 0) & A_1(0, 1) & A_1(0, 2) \\A_1(1, 0) & A_1(1, 1) & A_1(1, 2) \\A_1(2, 0) & A_1(2, 1) & A_1(2, 3) \\A_1(3, 0) & A_1(3, 1) & A_1(3, 2) \\\end{bmatrix}}_{A_1(j_0, k_0)}

Next we write out the terms for X(j1,j0)X(j_1, j_0)

X(0,0)=A1(0,0)ω400w00+A1(0,1)ω401w01+A1(0,2)ω402w02X(0,1)=A1(1,0)ω400w10+A1(1,1)ω401w11+A1(1,2)ω402w12X(0,2)=A1(2,0)ω400w20+A1(2,1)ω401w21+A1(2,2)ω402w22X(0,3)=A1(3,0)ω400w30+A1(3,1)ω401w31+A1(3,2)ω402w32X(1,0)=A1(0,0)ω410w00+A1(0,1)ω411w01+A1(0,2)ω412w02X(1,1)=A1(1,0)ω410w10+A1(1,1)ω411w11+A1(1,2)ω412w12X(1,2)=A1(2,0)ω410w20+A1(2,1)ω411w21+A1(2,2)ω412w22X(1,3)=A1(3,0)ω410w30+A1(3,1)ω411w31+A1(3,2)ω412w32X(2,0)=A1(0,0)ω420w00+A1(0,1)ω421w01+A1(0,2)ω422w02X(2,1)=A1(1,0)ω420w10+A1(1,1)ω421w11+A1(1,2)ω422w12X(2,2)=A1(2,0)ω420w20+A1(2,1)ω421w21+A1(2,2)ω422w22X(2,3)=A1(3,0)ω420w30+A1(3,1)ω421w31+A1(3,2)ω422w32\begin{aligned}X(0, 0) &= A_1(0, 0) \omega^{4\cdot0\cdot0} \cdot w^{0\cdot0} + A_1(0, 1) \omega^{4\cdot0\cdot1} \cdot w^{0\cdot1} + A_1(0, 2) \omega^{4\cdot0\cdot2} \cdot w^{0\cdot2} \\X(0, 1) &= A_1(1, 0) \omega^{4\cdot0\cdot0} \cdot w^{1\cdot0} + A_1(1, 1) \omega^{4\cdot0\cdot1} \cdot w^{1\cdot1} + A_1(1, 2) \omega^{4\cdot0\cdot2} \cdot w^{1\cdot2} \\X(0, 2) &= A_1(2, 0) \omega^{4\cdot0\cdot0} \cdot w^{2\cdot0} + A_1(2, 1) \omega^{4\cdot0\cdot1} \cdot w^{2\cdot1} + A_1(2, 2) \omega^{4\cdot0\cdot2} \cdot w^{2\cdot2} \\X(0, 3) &= A_1(3, 0) \omega^{4\cdot0\cdot0} \cdot w^{3\cdot0} + A_1(3, 1) \omega^{4\cdot0\cdot1} \cdot w^{3\cdot1} + A_1(3, 2) \omega^{4\cdot0\cdot2} \cdot w^{3\cdot2} \\X(1, 0) &= A_1(0, 0) \omega^{4\cdot1\cdot0} \cdot w^{0\cdot0} + A_1(0, 1) \omega^{4\cdot1\cdot1} \cdot w^{0\cdot1} + A_1(0, 2) \omega^{4\cdot1\cdot2} \cdot w^{0\cdot2} \\X(1, 1) &= A_1(1, 0) \omega^{4\cdot1\cdot0} \cdot w^{1\cdot0} + A_1(1, 1) \omega^{4\cdot1\cdot1} \cdot w^{1\cdot1} + A_1(1, 2) \omega^{4\cdot1\cdot2} \cdot w^{1\cdot2} \\X(1, 2) &= A_1(2, 0) \omega^{4\cdot1\cdot0} \cdot w^{2\cdot0} + A_1(2, 1) \omega^{4\cdot1\cdot1} \cdot w^{2\cdot1} + A_1(2, 2) \omega^{4\cdot1\cdot2} \cdot w^{2\cdot2} \\X(1, 3) &= A_1(3, 0) \omega^{4\cdot1\cdot0} \cdot w^{3\cdot0} + A_1(3, 1) \omega^{4\cdot1\cdot1} \cdot w^{3\cdot1} + A_1(3, 2) \omega^{4\cdot1\cdot2} \cdot w^{3\cdot2} \\X(2, 0) &= A_1(0, 0) \omega^{4\cdot2\cdot0} \cdot w^{0\cdot0} + A_1(0, 1) \omega^{4\cdot2\cdot1} \cdot w^{0\cdot1} + A_1(0, 2) \omega^{4\cdot2\cdot2} \cdot w^{0\cdot2} \\X(2, 1) &= A_1(1, 0) \omega^{4\cdot2\cdot0} \cdot w^{1\cdot0} + A_1(1, 1) \omega^{4\cdot2\cdot1} \cdot w^{1\cdot1} + A_1(1, 2) \omega^{4\cdot2\cdot2} \cdot w^{1\cdot2} \\X(2, 2) &= A_1(2, 0) \omega^{4\cdot2\cdot0} \cdot w^{2\cdot0} + A_1(2, 1) \omega^{4\cdot2\cdot1} \cdot w^{2\cdot1} + A_1(2, 2) \omega^{4\cdot2\cdot2} \cdot w^{2\cdot2} \\X(2, 3) &= A_1(3, 0) \omega^{4\cdot2\cdot0} \cdot w^{3\cdot0} + A_1(3, 1) \omega^{4\cdot2\cdot1} \cdot w^{3\cdot1} + A_1(3, 2) \omega^{4\cdot2\cdot2} \cdot w^{3\cdot2} \\\end{aligned}

Simplifying, we get

X(0)=A1(0)ω0w0+A1(1)ω0w0+A1(2)ω0w0X(1)=A1(3)ω0w0+A1(4)ω0w1+A1(5)ω0w2X(2)=A1(6)ω0w0+A1(7)ω0w2+A1(8)ω0w4X(3)=A1(9)ω0w0+A1(10)ω0w3+A1(11)ω0w6X(4)=A1(0)ω0w0+A1(1)ω4w0+A1(2)ω8w0X(5)=A1(3)ω0w0+A1(4)ω4w1+A1(5)ω8w2X(6)=A1(6)ω0w0+A1(7)ω4w2+A1(8)ω8w4X(7)=A1(9)ω0w0+A1(10)ω4w3+A1(11)ω8w6X(8)=A1(0)ω0w0+A1(1)ω8w0+A1(2)ω16w0X(9)=A1(3)ω0w0+A1(4)ω8w1+A1(5)ω16w2X(10)=A1(6)ω0w0+A1(7)ω8w2+A1(8)ω16w4X(11)=A1(9)ω0w0+A1(10)ω8w3+A1(11)ω16w6\begin{aligned}X(0) &= A_1(0) \cdot \omega^{0} \cdot w^{0} + A_1(1) \cdot \omega^{0} \cdot w^{0} + A_1(2) \cdot \omega^{0} \cdot w^{0} \\X(1) &= A_1(3) \cdot \omega^{0} \cdot w^{0} + A_1(4) \cdot \omega^{0} \cdot w^{1} + A_1(5) \cdot \omega^{0} \cdot w^{2} \\X(2) &= A_1(6) \cdot \omega^{0} \cdot w^{0} + A_1(7) \cdot \omega^{0} \cdot w^{2} + A_1(8) \cdot \omega^{0} \cdot w^{4} \\X(3) &= A_1(9) \cdot \omega^{0} \cdot w^{0} + A_1(10) \cdot \omega^{0} \cdot w^{3} + A_1(11) \cdot \omega^{0} \cdot w^{6} \\X(4) &= A_1(0) \cdot \omega^{0} \cdot w^{0} + A_1(1) \cdot \omega^{4} \cdot w^{0} + A_1(2) \cdot \omega^{8} \cdot w^{0} \\X(5) &= A_1(3) \cdot \omega^{0} \cdot w^{0} + A_1(4) \cdot \omega^{4} \cdot w^{1} + A_1(5) \cdot \omega^{8} \cdot w^{2} \\X(6) &= A_1(6) \cdot \omega^{0} \cdot w^{0} + A_1(7) \cdot \omega^{4} \cdot w^{2} + A_1(8) \cdot \omega^{8} \cdot w^{4} \\X(7) &= A_1(9) \cdot \omega^{0} \cdot w^{0} + A_1(10) \cdot \omega^{4} \cdot w^{3} + A_1(11) \cdot \omega^{8} \cdot w^{6} \\X(8) &= A_1(0) \cdot \omega^{0} \cdot w^{0} + A_1(1) \cdot \omega^{8} \cdot w^{0} + A_1(2) \cdot \omega^{16} \cdot w^{0} \\X(9) &= A_1(3) \cdot \omega^{0} \cdot w^{0} + A_1(4) \cdot \omega^{8} \cdot w^{1} + A_1(5) \cdot \omega^{16} \cdot w^{2} \\X(10) &= A_1(6) \cdot \omega^{0} \cdot w^{0} + A_1(7) \cdot \omega^{8} \cdot w^{2} + A_1(8) \cdot \omega^{16} \cdot w^{4} \\X(11) &= A_1(9) \cdot \omega^{0} \cdot w^{0} + A_1(10) \cdot \omega^{8} \cdot w^{3} + A_1(11) \cdot \omega^{16} \cdot w^{6} \\\end{aligned}

Visualising the above eqs in matrix form is a bit tricky, because we seem to be multiplying 3 matrices here. But upon closer inspection, two of them is an element-wise product or so-called Hadamard product of the two matrices. Given by:

AB=AijBijA \cdot B = A_{ij} \cdot B_{ij}

Revisiting eq (4)(4)

X(j1,j0)=k0=0r21A1(j0,k0)ωj0k0hadamard product(ωr1)j1k0=k0=0r21(A1(j0,k0)ωj0k0)(ωr1)j1k0\begin{split} X(j_1, j_0) &= \sum^{r_2-1}_{k_0 = 0} \underbrace{A_1(j_0, k_0) \cdot \omega^{j_0k_0}}_{\text{hadamard product}} \cdot (\omega^{r_1})^{j_1k_0} \\ &= \sum^{r_2-1}_{k_0 = 0} (A_1(j_0, k_0)\omega^{j_0k_0}) \cdot (\omega^{r_1})^{j_1k_0} \end{split}

The matrix ωj0k0\omega^{j_0k_0} is known as the twiddle factors.

[A1(0)A1(1)A1(2)A1(3)A1(4)A1(5)A1(6)A1(7)A1(8)A1(9)A1(10)A1(11)]A1(j0,k0)[ω0ω0ω0ω0ω1ω2ω0ω2ω4ω0ω3ω6]ωj0k0=[A1(0,0)ω0A1(0,1)ω0A1(0,2)ω0A1(1,0)ω0A1(1,1)ω1A1(1,2)ω2A1(2,0)ω0A1(2,1)ω2A1(2,2)ω4A1(3,0)ω0A1(3,1)ω3A1(3,2)ω6]A1(j0,k0)ωj0k0\underbrace{\begin{bmatrix}A_1(0) & A_1(1) & A_1(2) \\A_1(3) & A_1(4) & A_1(5) \\A_1(6) & A_1(7) & A_1(8) \\A_1(9) & A_1(10) & A_1(11) \\\end{bmatrix}}_{A_1(j_0, k_0)} \underbrace{\begin{bmatrix} \omega^0 &\omega^0 & \omega^0 \\ \omega^0 & \omega^1 & \omega^2 \\ \omega^0 & \omega^2 & \omega^4 \\ \omega^0 & \omega^3 & \omega^6 \\ \end{bmatrix}}_{\omega^{j_0k_0}} = \underbrace{\begin{bmatrix}A_1(0, 0) \cdot \omega^0 & A_1(0, 1) \cdot \omega^0 & A_1(0, 2) \cdot \omega^0 \\A_1(1, 0) \cdot \omega^0 & A_1(1, 1) \cdot \omega^1 & A_1(1, 2) \cdot \omega^2 \\A_1(2, 0) \cdot \omega^0 & A_1(2, 1) \cdot \omega^2 & A_1(2, 2) \cdot \omega^4 \\A_1(3, 0) \cdot \omega^0 & A_1(3, 1) \cdot \omega^3 & A_1(3, 2) \cdot \omega^6 \\\end{bmatrix}}_{A_1(j_0, k_0) \cdot \omega^{j_0k_0}}

Now it’s finally time to multiply by our (ωr1)j1k0(\omega^{r_1})^{j_1k_0} matrix, but the equation for X(j1,j0)X(j_1, j_0) is a bit weird, we seem to multiplying in row-order instead of the conventional column order. This can be seen as multiplying by a transposed matrix, given as:

Aji=AijA^\prime_{ji} = A_{ij}

Therefore we take the transpose of the result of this element-wise multiplication so that the multiplication is in the correct column-order.

[A1(0,0)ω0A1(1,0)ω0A1(2,0)ω0A1(3,0)ω0A1(0,1)ω0A1(1,1)ω1A1(2,1)ω2A1(3,1)ω3A1(0,2)ω0A1(1,2)ω2A1(2,2)ω3A1(3,2)ω6](A1(j0,k0)ωj0k0)\underbrace{\begin{bmatrix}A_1(0, 0) \cdot \omega^0 & A_1(1, 0) \cdot \omega^0 & A_1(2, 0) \cdot \omega^0 & A_1(3, 0) \cdot \omega^0 \\A_1(0, 1) \cdot \omega^0 & A_1(1, 1) \cdot \omega^1 & A_1(2, 1) \cdot \omega^2 & A_1(3, 1) \cdot \omega^3 \\A_1(0, 2) \cdot \omega^0 & A_1(1, 2) \cdot \omega^2 & A_1(2, 2) \cdot \omega^3 & A_1(3, 2) \cdot \omega^6 \\\end{bmatrix}}_{(A_1(j_0, k_0) \cdot \omega^{j_0k_0})^\prime}

Finally we multiply the transpose by the (ωr1)j1k0(\omega^{r_1})^{j_1k_0} matrix.

[ω0ω0ω0ω0ω4ω8ω0ω8ω16](ωr1)j1k0[A1(0,0)ω0A1(1,0)ω0A1(2,0)ω0A1(3,0)ω0A1(0,1)ω0A1(1,1)ω1A1(2,1)ω2A1(3,1)ω3A1(0,2)ω0A1(1,2)ω2A1(2,2)ω3A1(3,2)ω6](A1(j0,k0)ωj0k0)=[X(0,0)X(0,1)X(0,2)X(0,3)X(1,0)X(1,1)X(1,2)X(1,3)X(2,0)X(2,1)X(2,2)X(2,3)]X(j1,j0)\underbrace{\begin{bmatrix}\omega^0 & \omega^0 & \omega^0 \\\omega^0 & \omega^4 & \omega^8 \\\omega^0 & \omega^8 & \omega^{16} \\\end{bmatrix}}_{(\omega^{r_1})^{j_1k_0}} \underbrace{\begin{bmatrix}A_1(0, 0) \cdot \omega^0 & A_1(1, 0) \cdot \omega^0 & A_1(2, 0) \cdot \omega^0 & A_1(3, 0) \cdot \omega^0 \\A_1(0, 1) \cdot \omega^0 & A_1(1, 1) \cdot \omega^1 & A_1(2, 1) \cdot \omega^2 & A_1(3, 1) \cdot \omega^3 \\A_1(0, 2) \cdot \omega^0 & A_1(1, 2) \cdot \omega^2 & A_1(2, 2) \cdot \omega^3 & A_1(3, 2) \cdot \omega^6 \\\end{bmatrix}}_{(A_1(j_0, k_0) \cdot \omega^{j_0k_0})^\prime} = \underbrace{\begin{bmatrix}X(0, 0) & X(0, 1) & X(0, 2) & X(0, 3) \\X(1, 0) & X(1, 1) & X(1, 2) & X(1, 3) \\X(2, 0) & X(2, 1) & X(2, 2) & X(2, 3) \\\end{bmatrix}}_{X(j_1, j_0)}

We can observe that the A1(j0,k0)A_1(j_0, k_0) DFT requires r1r_1 operations, while X(j1,j0)X(j_1, j_0) requires r2r_2 operations. Both have a total of NN elements, hence the complexity for this algorithm is:

T=N(r1+r2)T = N(r_1 + r_2)

Recursion

Lets observe one of the DFTs from eq (3)(3)

[ω0ω0ω0ω0ω0ω3ω6ω9ω0ω6ω12ω18ω0ω9ω18ω27][A(0)A(3)A(6)A(9)]=[A1(0,0)A1(1,0)A1(2,0)A1(3,0)]\begin{bmatrix}\omega^0 & \omega^0 & \omega^0 & \omega^0 \\\omega^0 & \omega^3 & \omega^6 & \omega^9 \\\omega^0 & \omega^6 & \omega^{12} & \omega^{18} \\\omega^0 & \omega^9 & \omega^{18} & \omega^{27} \\\end{bmatrix} \begin{bmatrix}A(0) \\A(3) \\A(6) \\A(9) \\\end{bmatrix} = \begin{bmatrix} A_1(0, 0) \\ A_1(1, 0) \\ A_1(2, 0) \\ A_1(3, 0) \\\end{bmatrix}

It’s clear that this is the exact same problem we started with, recall the matrix form of eq (1)(1). Hence the Cooley-Tukey FFT can be used to decompose this DFT further. Unfortunately in this example we end up with the same complexity since 22=2+22\cdot2 = 2+2. But N=4N = 4 is the only case where the sum of factors equals it’s product. In other cases recursive application of the Cooley-Tukey FFT provides significant efficiency gains when NN is highly composite:

N=(r0r1rn1)\begin{split} N &= (r_0 \cdot r_1 \cdots r_{n-1}) \end{split}

Then the complexity of the algorithm becomes:

T=N(i=0n1ri)T = N(\sum_{i = 0}^{n-1} r_i)

The Cooley-Tukey FFT can also be combined with other FFT algorithms like the Good-Thomas FFT for factors that are coprime. This method completely eliminates the need for twiddle factors. Rader’s FFT can also be applied to DFT’s of prime lengths since they have no factors and can’t be decomposed further by the Cooley-Tukey algorithm.

Radix2FFT

What you popularly know as the Radix2FFT is actually another form of the Cooley-Tukey FFT. Since the Cooley-Tukey FFT can be applied recursively. Given the case when N=2kN = 2^k, you can continue to apply the the FFT until you reach a base case of two size-2 DFTs which can you evaluate and start to combine. This transformation yields the fastest time to compute the DFT:

T=2k(20+21++2k)=2Nlog2N\begin{split} T &= 2^k(2_0 + 2_1 + \dots + 2_k)\\ &= 2N\log_2 N \end{split}

Inverse FFT

The inverse FFT is the same as the FFT except it’s performed with inverse powers of the generator and finally multiplied by the inverse of the group order. This is given as:

A(k)=1Nj=0N1X(j)ωjkk=0,1,N1.A(k) = \frac{1}{N}\sum^{N-1}_{j=0}X(j) \cdot\omega^{-jk} \quad k = 0,1,\dots N-1.

(Proof is left as an exercise for the reader)

References

[1]^{[1]} An Algorithm for the Machine Calculation of Complex Fourier Series. James W. Cooley and John W. Tukey