While coding the fast Fourier transform I needed a complex square root. There was not then such a thing in the library, or I couldn’t find it. Now there is but Apple’s extended precision version (Intel) of csqrtl was then unextended (2007 Apr). (Now (2007 Aug.) it is good.)

This version unifies double and extended precision and includes crude debugging hooks. This is the double precision version shorn of the switches and hooks and this is the extended precision version.

These two pass the crude provided tests while the csqrtl supplied by Apple produces only about 16 places of precision. Today is 2007, Apr 15. PS: As of 2007 July the Apple csqrtl is good.

This works with Apple’s gcc 4.0.1 using the -std=c99 option. I have not tested it with earlier versions.

The test of the sign of A is to avoid a region of the complex plane, near the negative real axis, where cancelation ruins accuracy. The effect may be seen in this modification which omits the test and yields:

  0.00000000000000000e+00   2.00000000000000000e+00 2i

  2.00000000000000000e+00   3.00000000000000000e+00 x
  1.67414922803554012e+00   8.95977476129838135e-01 r
  0.00000000000000000e+00   0.00000000000000000e+00 r*r-x

 -1.00000000000000000e+00  -1.00000000000000002e-03 x
  4.99999937529384307e-04  -1.00000012494124713e+00 r
  1.17427734203090495e-10  -2.16840434497100887e-19 r*r-x

 -1.00000000000000000e+00   0.00000000000000000e+00 x
  0.00000000000000000e+00                       nan r
                      nan                       nan r*r-x
which gets only about 10 digits of accuracy whereas with the test we get nearly 16 digits:
  0.00000000000000000e+00   2.00000000000000000e+00 2i

  2.00000000000000000e+00   3.00000000000000000e+00 x
  1.67414922803554012e+00   8.95977476129838135e-01 r
  0.00000000000000000e+00   0.00000000000000000e+00 r*r-x

 -1.00000000000000000e+00  -1.00000000000000002e-03 x
  4.99999937500027257e-04  -1.00000012499996105e+00 r
 -2.22044604925031308e-16   0.00000000000000000e+00 r*r-x

 -1.00000000000000000e+00   0.00000000000000000e+00 x
  0.00000000000000000e+00   1.00000000000000000e+00 r
  0.00000000000000000e+00   0.00000000000000000e+00 r*r-x
The tests for very large and very small argument avoid exponent overflow and underflow in intermediate results.