FFT in C

Frank Vanbever -- Sunday, March 31, 2024

I've been on a deep dive into DSP (Digital Signal Processing). Mainly because I want to know how audio effects work. I started with a book about programming VST (Virtual Studio Technology) plugins. Though I quickly found out that the DSP course I took at university was a bit too long ago, so I had to go back to basics.

I worked through "Mathematics of the Discrete Fourier Transform (DFT) With Audio Applications" by Julius O. Smith III. It was a super interesting experience, though also a lot of work. In the end, I feel that I have a good grasp of the Fourier transform and a lot of the mathematics underpinning it.

After having finished I figured that I should probably try to translate this knowledge into something practical, so I decided that I wanted to implement the DFT and Cooley-Tukey FFT in C.

Up until this point, I've been exclusively working in Python, using Numpy, Scipy, and Matplotlib in a Jupyter Lab. The combination of these projects makes scientific computing a breeze.

Moving to C made me realize I never really have done something like this in my 15 years of experience. I have no clue how complex numbers work in C for example. So these are my somewhat cleaned-up notes of my process of figuring out how to implement the DFT/FFT in C.

Complex numbers

Complex numbers in C were introduced in the C99 standard. Everything you need is available in complex.h. In theory, it's possible with some compiler magic to get these things without the header (e.g. _Complex is seemingly available without including the header) but it doesn't make a lot of sense to do that.

0#include <complex.h>

A complex number is made up of a real and imaginary part. A declaration of a complex number looks like this:

0  double complex foo = 1.0 + 2.0*I;

This is a double-precision floating point type. I is the imaginary constant. If you want to get the separate real and imaginary parts the functions creal and cimag provide these.

The following function does a pretty print of a complex number:

0  #include <stdio.h>
1  void print_complex(complex z)
2  {
3      printf("%f+%f*i\n", creal(z), cimag(z));
4  }

Arithmetic works just like it would with regular types:

0  void complex_arithmetic(void)
1  {
2      double complex a = 1.0 + 2.0*I;
3      double complex b = 2.0 + 3.0*I;
4
5      print_complex(a+b);
6      print_complex(a-b);
7      print_complex(a*b);
8      print_complex(a/b);
9  }

Gives the following output:

03.000000+5.000000*i
1-1.000000+-1.000000*i
2-4.000000+7.000000*i
30.615385+0.076923*i

Which looks OK. This is really all we need to implement a DFT.

Discrete Fourier Transform

The discrete Fourier transform is defined by the following formula

$$ X_{k} = \sum_{n = 0}^{N - 1} x_{n} \cdot e^{-\frac{i2\pi}{N}kn} $$

We've got arithmetic down, but there's a couple more things that we need. First of all, there's the exponential function. Regularly you'd use exp( double arg) as defined in math.h. However, given that we're dealing with complex numbers here we need the complex variant cexp(double complex z) instead. The other thing is \(\pi\), which is defined as M_PI as part of the mathematical constants that GCC adds to math.h.

This gives us the following C implementation:

 0void dft(complex double *x, complex double *X, size_t len)
 1{
 2	int N = len;
 3	if (N == 0)
 4		return;
 5	for (int k = 0; k < len; k++) {
 6		X[k] = 0.0f;
 7		for (int n = 0; n < len; n++) {
 8			long double exponent = (-2.0L * M_PI * k * n) / N;
 9			X[k] += x[n] * cexp(I * exponent);
10		}
11	}
12}

When we use the following loop to fill in some dummy sample data

0static void fill_complex_testdata(double complex *x, size_t num_samples)
1{
2	for (int i = 0; i < num_samples; i++) {
3		x[i] = 1.0 * i + 1.0 * i * I;
4	}
5}

We get the following output:

Frequency samples
28.000000+28.000000*i
-13.656854+5.656854*i
-8.000000+-0.000000*i
-5.656854+-2.343146*i
-4.000000+-4.000000*i
-2.343146+-5.656854*i
-0.000000+-8.000000*i
5.656854+-13.656854*i

Let's compare if this output is correct by checking it against the numpy FFT implementation.

 0from numpy.fft import fft
 1
 2def complex_testdata(num_samples):
 3    testdata = []
 4    for i in range(num_samples):
 5        testdata.append(i * 1.0+ i * 1.0j)
 6
 7    return testdata
 8
 9
10
11def complex_fft_results(num_samples):
12    x = complex_testdata(num_samples)
13    X = fft(x)
14
15    for i in X:
16        print(i)
17
18
19
20if __name__ == '__main__':
21    complex_fft_results(8)

Let's compare the output of this with the output of our C function:

CPython
28.000000+28.000000*i(28+28j)
-13.656854+5.656854*i(-13.65685424949238+5.656854249492381j)
-8.000000+-0.000000*i(-8+0j)
-5.656854+-2.343146*i(-5.656854249492381-2.3431457505076194j)
-4.000000+-4.000000*i(-4-4j)
-2.343146+-5.656854*i(-2.3431457505076194-5.656854249492381j)
-0.000000+-8.000000*i(0-8j)
5.656854+-13.656854*i(5.656854249492381-13.65685424949238j)

Those numbers look almost the same, the small differences can be attributed to floating point rounding errors.

Cooley-Tukey FFT

The Cooley-Tukey FFT is an algorithm to compute the Fourier Transform faster than the regular DFT. It's recursive in nature. The Wikipedia article about it is pretty good so I'd direct you over there for the explanation.

 0int fft(complex double *x, complex double **X, size_t len)
 1{
 2	int ret = -1;
 3	complex double *x_even = NULL;
 4	complex double *x_odd = NULL;
 5
 6	complex double *X_even = NULL;
 7	complex double *X_odd = NULL;
 8
 9	complex double *factor = NULL;
10
11	*X = malloc(len * sizeof(complex double));
12	if (*X == NULL) {
13		goto cleanup;
14	}
15
16	if (len == 1) {
17		(*X)[0] = x[0];
18		return 0;
19	} else {
20		if (get_even(x, len, &x_even)) {
21			goto cleanup;
22		}
23		if (get_odd(x, len, &x_odd)) {
24			goto cleanup;
25		}
26
27		if (fft(x_even, &X_even, len / 2)) {
28			goto cleanup;
29		}
30
31		if (fft(x_odd, &X_odd, len / 2)) {
32			goto cleanup;
33		}
34
35		if (get_factor(len, &factor)) {
36			goto cleanup;
37		}
38
39		for (int i = 0; i < len; i++) {
40			int index = i % (len / 2);
41			(*X)[i] = X_even[index] + X_odd[index] * factor[i];
42		}
43	}
44
45	ret = 0;
46
47cleanup:
48	if (x_even)
49		free(x_even);
50	if (x_odd)
51		free(x_odd);
52	if (X_even)
53		free(X_even);
54	if (X_odd)
55		free(X_odd);
56	if (factor)
57		free(factor);
58
59	return ret;
60}

The algorithm works recursively on the even and odd samples. To respectively get the even and odd samples I use the following:

 0static int get_even(complex double *x, size_t len, complex double **even)
 1{
 2	size_t N = (len / 2) + (len % 2);
 3
 4	*even = malloc(N * sizeof(complex double));
 5	if (*even == NULL) {
 6		return -1;
 7	}
 8
 9	for (int i = 0; i < N; i++) {
10		(*even)[i] = x[i * 2];
11	}
12
13	return 0;
14}
15
16static int get_odd(complex double *x, size_t len, complex double **odd)
17{
18	size_t N = len / 2;
19
20	*odd = malloc(N * sizeof(complex double));
21	if (*odd == NULL) {
22		return -1;
23	}
24
25	for (int i = 0; i < N; i++) {
26		(*odd)[i] = x[(i * 2) + 1];
27	}
28
29	return 0;
30}
31
32static int get_factor(int N, complex double **factor)
33{
34	*factor = malloc(N * sizeof(complex double));
35	if (*factor == 0) {
36		return -1;
37	}
38
39	for (int i = 0; i < N; i++) {
40		(*factor)[i] = cexp(-2 * I * i * M_PI / N);
41	}
42
43	return 0;
44}

Using it on the same data that We've been using before gives the following results:

C (dft)C (fft)Python
28.000000+28.000000*i28.000000+28.000000*i(28+28j)
-13.656854+5.656854*i-13.656854+5.656854*i(-13.65685424949238+5.656854249492381j)
-8.000000+-0.000000*i-8.000000+-0.000000*i(-8+0j)
-5.656854+-2.343146*i-5.656854+-2.343146*i(-5.656854249492381-2.3431457505076194j)
-4.000000+-4.000000*i-4.000000+-4.000000*i(-4-4j)
-2.343146+-5.656854*i-2.343146+-5.656854*i(-2.3431457505076194-5.656854249492381j)
-0.000000+-8.000000*i0.000000+-8.000000*i(0-8j)
5.656854+-13.656854*i5.656854+-13.656854*i(5.656854249492381-13.65685424949238j)

which as you can see corresponds pretty well with what we got from both Python and the DFT. There's something weird happening with positive and negative zero but I'm chalking that up to IEEE 754 being weird.

Iterative FFT

Coming from an embedded software background that recursive FFT implementation doesn't sit exactly right with me. First of all, there's the issue of the function being recursive, which means that all of the state is being kept on the stack, which consumes memory that we might not have available leading to a possible stack overflow. Additionally, we're allocating and deallocating memory all over the place. Fortunately, there's a solution for this in the iterative FFT, which uses bit-reversal to basically do an in-place iterative radix-2 FFT. Once again Wikipedia has a very good explanation.

 0int iterative_fft(const complex double *const x, complex double *X, size_t len)
 1{
 2	complex double w_m = 0.0;
 3	complex double w = 0.0;
 4	complex double t = 0.0;
 5	complex double u = 0.0;
 6	int m = 0;
 7
 8	bit_reverse_copy(x, X, len);
 9	for (int s = 1; s <= log2(len); s++) {
10		m = pow(2, s);
11		w_m = cexp(-2 * M_PI * I / m);
12		for (int k = 0; k < len; k += m) {
13			w = 1.0;
14			for (int j = 0; j < m / 2; j++) {
15				t = w * X[k + j + m / 2];
16				u = X[k + j];
17				X[k + j] = u + t;
18				X[k + j + m / 2] = u - t;
19				w = w * w_m;
20			}
21		}
22	}
23
24	return 0;
25}

With the bit reverse copy:

 0static uint32_t rev(const uint32_t a, size_t len)
 1{
 2	uint32_t num_of_bits = log2(len);
 3	uint32_t mask = 1 << (num_of_bits - 1);
 4	uint32_t reversed = 0;
 5	for (int i = 0; i < num_of_bits; i++) {
 6		if ((a & mask) != 0) {
 7			reversed = reversed | (1 << i);
 8		}
 9		mask = mask >> 1;
10	}
11
12	return reversed;
13}
14
15static int bit_reverse_copy(const complex double *const x, complex double *X,
16			    size_t len)
17{
18	for (int i = 0; i < len; i++) {
19		X[rev(i, len)] = x[i];
20	}
21
22	return 0;
23}

Running this with the test data used for the previous 2 implementations gives the following results:

C (fft)C (iterative FFT)Python
28.000000+28.000000*i28.000000+28.000000*i(28+28j)
-13.656854+5.656854*i-13.656854+5.656854*i(-13.65685424949238+5.656854249492381j)
-8.000000+-0.000000*i-8.000000+-0.000000*i(-8+0j)
-5.656854+-2.343146*i-5.656854+-2.343146*i(-5.656854249492381-2.3431457505076194j)
-4.000000+-4.000000*i-4.000000+-4.000000*i(-4-4j)
-2.343146+-5.656854*i-2.343146+-5.656854*i(-2.3431457505076194-5.656854249492381j)
0.000000+-8.000000*i0.000000+-8.000000*i(0-8j)
5.656854+-13.656854*i5.656854+-13.656854*i(5.656854249492381-13.65685424949238j)

Complex numbers on Windows

As an aside, I also tried to get this code running on a Windows machine, but Microsoft's MSVC does not support the complex implementation defined in the C standard. I'm going to pretend like I never saw that and deal with it when it becomes inevitable 😅.

Conclusion

I've put these functions in a little library that I published on GitHub. It's nothing much right now, just these functions, but I intend for it to become a place where I put all that C functionality that I find missing from standard libraries.