sz is the size of a side of the array; it must be a power of 2.
The code can employ cpus processors.
If there are fewer processors then things still work and the performance gain is nearly as much.
a is the 2D array.
cexp (the library complex exponential) is used in main to intialize the grid to a plane wave.
fft is the routine introduced here but for floats instead of double values.
The routine psam just prints a small select rectangle from the mesh.
d2Dft causes the transform of the values in a to replace those values.
fp(th, m) finds and reports the values in a with the greatest magnitude.
The first transform leaves one peak which reflects the mono frequency initial wave. The peak is spread a bit because the wave was not periodic; it is discontinuous at the boundary. After two transforms we have the mesh reversed about the center. After four transforms we are back to the initial values.
I compile on the Mac with gcc ft.c fft.c -Wall -O3 and run with ./a.out.