What could be cuter than the infamous fast inverse square root function used in the Quake 3 engine:

http://en.wikipedia.org/wiki/Fast_inverse_square_root

The interesting part is the calculation of the initial approximation, splitting this down into the basic steps, we have:

float isqrt(float f) { uint32_t n = floattoint(f); n >>= 1; n = -n; n += 0x5f000000; n += 0x3759df; float x = inttofloat(n); return x; }

To get some insight into what’s going on here, we need to look at the representation of floating point numbers. An IEEE-754 float consists of a sign bit, an exponent value, and a mantissa. The exponent is an 8-bit value, the mantissa has 23 bits, both unsigned. As usual, a suitable notation is key: simplifying a little (specifically, ignoring NaNs, infinities and denormalized numbers), we shall write a float of the form `{sign:1;exponent:8;mantissa:23}`

as `(s,e,m)`

, with `m`

a real in the range `[0,1)`

, and this represents `2`

, negated if the sign bit is 1.^{e-127}(1+m)

To warm up, it’s helpful to look at a simpler example:

float fsqrt(float f) { unsigned n = floattoint(f); n >>=1; n += 0x1f800000; n += 0x400000; return inttofloat(n); }

Here we are computing an approximation to a normal square root.

Taking it a step at a time: first the shift right `n >>= 1`

, there are two cases for odd and even exponent:

(0,2e,m) => (0,e,m/2) // No carry from exponent (0,2e+1,m) => (0,e,1/2 + m/2) // Carry into mantissa

For `n += 0x1f800000`

: we are adding 63 (0x1f800000 is 63 << 23) onto the exponent:

(s,e,m) => (s,e+63,m)

And finally, `n += 0x400000`

: Generally, if we add p onto the mantissa, where 0 <= p < 2^{23}, and writing m’ = m + p/2^{23}, we have:

(s,e,m) => (s,e,m') if m' < 1 (s,e,m) => (s,e+1,m'-1) if m' >= 1

For `p = 0x400000 = 2`

, we have ^{22}`m' = m+1/2`

. ie:

(s,e,m) => (s,e+1,m'-1) if m' >= 1 => (s,e,m') otherwise

Putting this together, for even powers of two:

2^{2k}(1+m) => 2^{k}(1+m/2): (0,2k+127,m) => (0,k+63,0.5+m/2) => (0,k+126,0.5+m/2) => (0,k+127,m/2)

And for odd powers:

2^{(2k+1)}(1+m) => 2^{k}(1+0.5+m/2): (0,2k+1+127,m) => (0,k+64,m/2) => (0,k+127,m/2) => (0,k+127,0.5+m/2)

Let’s check this is sensible by setting m = 0:

2^{2k}=> 2^{k}2^{(2k+1)}=> 2^{k}(1+0.5)

and putting m = 1 we get:

2^{2k+1}=> 2^{k}(1+1/2) 2^{(2k+2)}=> 2^{k+1}

Our approximation is linear in between powers of two, and continuous at those points too. Also at even powers the graph is tangent to `sqrt(x)`

.

This is all nicely illustrated in a picture:

Returning to the original inverse function, we have an additional negation step, `n = -n`

: to negate a twos-complement number, we flip the bits and add one. There are two main cases, depending on whether the mantissa is zero. If it is zero, there is a carry into the exponent, otherwise we just flip the bits of the exponent. The sign bit will always end up set (I’m ignoring the case when the exponent is zero). The general effect is:

(0,e,0) => (1,256-e,0) (0,e,m) => (1,255-e,1-m)

This time, we are adding 190 onto the exponent (0x5f000000 = 190<<23) – this has the dual effect of resetting the sign bit to 0 and subtracting 66 from the exponent (190 = -66 mod 256).

Let’s see what happens with odd and even powers of two; writing the magic offset added onto the mantissa as `c`

:

(0,2k+127,m) => (0,k+63,0.5+m/2) // n >>= 1 => (1,255-k-63,0.5-m/2) // n = -n => (0,126-k,0.5-m/2) // n += 0x5f000000 => (0,126-k,0.5-m/2+c) // if 0.5-m/2+c < 1 => (0,127-k,-0.5-m/2+c) // if 0.5-m/2+c >= 1

(0,2k+128,m) => (0,k+63,m/2) // n >>= 1 => (1,255-k-63,1-m/2) // n = -n => (0,126-k,1-m/2) // n += 0x5f000000 => (0,126-k,1-m/2+c) // if 1-m/2+c < 1 => (0,127-k,-m/2+c) // if 1-m/2+c >= 1

If we use 0x400000 as the offset, ie. `c`

above is 0.5, and put m=0 in the two cases, we have:

2^{2k}=> 1/2^{k}2^{2k+1}=> 1.5/2^{k+1}

Once again, our approximation coincides exactly at even powers of two, and as before it’s useful to have a picture:

We don’t have nice tangents this time, but the end result isn’t too bad.

We probably could have saved ourselves some work here by noting that the mapping between 32-bit integers (as signed magnitude numbers) and the corresponding floats is monotonic and continuous (for some sense of continuous), so composing with other (anti)monotonic operations gives an (anti)monotonic result, so having checked our function is correct at powers of two, it can’t go too badly adrift in between.

We can improve our approximation by using a smaller mantissa offset, `0x3759df`

, and we end up with the function we came in with:

Not especially pretty, but a good starting point for some Newton-Raphson. Notice that as well as kinks at exact powers of two, this approximation has kinks in between as well (when adding the constant to the mantissa overflows into the exponent).