# Floating point numbers are made of jelly.

**Posted:**January 9, 2013

**Filed under:**Floating Point Leave a comment

Floating point equality can be quite interesting, leading some people to think that they are actually made of some sort of jelly, and don’t in fact have precise values. This isn’t true of course but sometimes things can get quite bizarre. I was surprised to find this assertion:

printf("%.20g %.20g\n", f(), g()); assert(f() == g());

failing with this output:

`$ ./foo`

0.9979466119096509491 0.9979466119096509491

foo: foo.cpp:15: int main(): Assertion `f() == g()' fails

Aborted

(All code compiled with gcc 4.4.3, on Ubuntu 10.04, 32 bit Celeron).

Now 20 significant figures are enough to tell the difference between two doubles, so this looks a little strange.

More investigation showed that:

double x = f(); assert(x == f());

failed as well and I started to get worried. It’s true that floating point numbers aren’t equal to themselves if they are NaNs, but that didn’t seem to be the case here. Also, the assertion didn’t fail for code that had been optimized.

Some years ago I did some work on code generation for the 387 FPU and then we had trouble with rounding – test programs for the 387 would produce slightly different results than for other architectures like MIPS and Sparc. IEEE754 is pretty precise about how rounding should be done so this seemed a bit strange until we realized that the 387 uses 80-bit precision internally, with rounding to 64 bit doubles only happening when transferring data from the FPU to memory (FISTP and friends). There is a flag to set the internal rounding mode to double precision and once we had set that we got consistent results on all architectures.

Looks like something similar is happening here and a look at the disassembly confirms this:

call _Z1av fstpl 24(%esp) call _Z1av fldl 24(%esp) fxch %st(1) fucompp fnstsw %ax

The first result of calling a() gets written out to memory as a double, hence is rounded, but then it’s reloaded but compared to the unrounded result from the second call.

Investigating further exposes more interesting behaviour:

#include <stdio.h> #include <math.h> #include <assert.h> int main() { double x,y,z; int a; x = 1.0; while (x != 0.0) { y = x; x /= 2; } a = y > 0; printf("%.40Lg\n", (long double)y); printf("%d\n", a); printf("%.20g\n", y/y); printf("%.40Lg\n", (long double)y); x = 1.0; y = nextafter(x,2.0); z = (x+y)/2.0; printf("%d\n", z > x && z < y); x = 0.5; while ((z = sqrt(x)) != x) x = z; y = x; while ((z = x*x) != x) x = z; printf("%.40Lg\n%.40Lg\n", (long double)x, (long double)y); }

Now we have completely different behaviour between optimized and unoptimized code:

`$ g++ -Wall fp.cpp`

$ ./a.out

4.940656458412465441765687928682213723651e-324

1

1

4.940656458412465441765687928682213723651e-324

0

1

1

Unoptimized, pretty much as expected, we iterate down to the smallest representable float, which we divide by itself and get 1. If we average one and the next number, we don’t get a number in between; and finally, if we iterate square roots, we eventually arrive at 1, and squaring that, just gives 1 again.

`$ g++ -O3 -Wall fp.cpp`

$ ./a.out

3.645199531882474602528405933619419816399e-4951

0

-nan

0

1

0

0.9999999999999999999457898913757247782996

Bah, we seem to have iterated right down to the smallest extended value there, but when we divide it by itself, we get a NaN – it seems to have been magically converted to zero, as confirmed by printing it out again (presumably the 387 register has been stored temporarily in memory, while we did the printf, and truncated to 64 bits on the way. Now, we are able to find a value between 1 and 1 + ε, and finally, our repeated square root function seems to converge on something slightly less that 1 and when we go back the other way, we end up at 0 instead.

Fortunately, we don’t need to worry about what to do about this or whether it’s the chip designers or the language designers or the FP standard designers or the compiler writers who are getting it wrong (or maybe it really is that FP numbers are made of jelly). Newer x86 processors have SSE registers for FP, which are 64 bit doubles throughout (as do most other common architectures), so these problems are less likely to occur:

... call _Z3av movsd %xmm0, -8(%rbp) call _Z3av ucomisd -8(%rbp), %xmm0 jne .L8 ...

and if we run our little test program (this time with gcc 4.7.2) on a 64-bit processor we get:

`$ g++ -Wall fp.cpp`

$ ./a.out

4.940656458412465441765687928682213723651e-324

1

1

4.940656458412465441765687928682213723651e-324

0

0

0.9999999999999998889776975374843459576368

for all levels of optimization.

We can force use of the 387 instructions with `-mfpmath=387`

and then we get:

`$ g++ -O3 -mfpmath=387 -Wall fp.cpp`

$ ./a.out

3.645199531882474602528405933619419816399e-4951

0

-nan

0

1

0

0.9999999999999999999457898913757247782996

again, so it doesn’t look like our problems are related to the particular gcc version.

With the new code, we lose the benefits of extra internal precision, but on the whole I prefer having half a chance of understanding what’s going on.

# 0 1 1 2 3 5 8 13…

**Posted:**January 1, 2013

**Filed under:**Number Theory, Python Leave a comment

For some reason I spent some time over Christmas looking at algorithms for computing Fibonacci numbers, so here’s the first post for 2013.

The Binet formulas for the Fibonacci and Lucas numbers are:

`φ = (1+√5)/2`

ψ = (1-√5)/2

φψ = -1

F_{n} = (φ^{n}-ψ^{n})/√5

L_{n} = φ^{n}+ψ^{n}

We can now derive:

`5F`

_{n}^{2} = φ^{2n}+ψ^{2n}-2(-1)^{n} = L_{2n}-2(-1)^{n}

L_{n}^{2} = φ^{2n}+ψ^{2n}+2(-1)^{n} = L_{2n}+2(-1)^{n}

L_{n}F_{n} = (φ^{2n}-ψ^{2n})/5 = F_{2n}

We can also use the formulas to prove:

`2φ`

^{n} = L_{n} + F_{n}√5

which gives us:

`L`

_{n+m} + F_{n+m}√5 = 2φ^{n+m} = 2φ^{n}φ^{m} = (L_{n}L_{m}+5F_{n}F_{m})/2 + (L_{n}F_{m}+F_{n}L_{m})√/2

So, equating the rational and irrational parts (as we may – if `a+bk = c+dk`

with `a`

,`b`

,`c`

,`d`

rational, then `k`

is rational unless `a=c`

and `b=d`

):

`L`

_{n+m} = (L_{n}L_{m}+5F_{n}F_{m})/2

F_{n+m} = (L_{n}F_{m}+F_{n}L_{m})/2

Two special cases for this:

`L`

_{2n} = (L_{n}^{2}+5F_{n}^{2})/2

F_{2n} = L_{n}F_{n}

`L`

_{n+1} = (L_{n}+5F_{n})/2

F_{n+1} = (L_{n}+F_{n})/2

With these recurrences, we can write some code:

def fib1aux(n): """Return F(n),L(n)""" if n == 0: return 0,2 else: a,b = fib1aux(n//2) a,b = a*b,(5*a*a+b*b)//2 if n%2 == 0: return a,b else: return (a+b)//2, (5*a+b)//2 return a,b def fib1(n): return fib1aux(n)[0]

We can save a bignum multiplication in each recursive step by replacing `a,b = a*b,(5*a*a+b*b)//2`

with something like `ab = a*b; t = (a+b)*(5*a+b); a,b = ab,t//2-3*ab`

(suggestion due to Robin Houston at http://bosker.wordpress.com/2011/07/27/computing-fibonacci-numbers-using-binet%E2%80%99s-formula/).

Alternatively, we can follow the suggestion of Knuth in TAOCP (solution to exercise 4.6.3.26) and define:

def fib2aux(n): """Return F(n),L(n)""" if n == 0: return 0,2 else: a,b = fib2aux(n//2) a,b = a*b,n//2%2*4-2+b*b if n%2 == 0: return a,b else: return (a+b)//2, (5*a+b)//2 def fib2(n): return fib2aux(n)[0]

In both cases, we benefit from avoiding the final calculation of L_{n} and define something like:

def fib2(n): if (n%2 == 0): a,b = fib2aux(n//2) return a*b else: a,b = fib2aux(n-1) return (a+b)//2

or, for a non-recursive solution (stealing some ideas from Robin Houston again):

def bits(n): r = [] while n!=0: r.append(n%2) n >>=1 r.reverse() return r def fib3(n): a,b = 0,2 last = 0 for i in bits(n//2): a,b = a*b,last*4-2+b*b last = i if i: a,b = (a+b)//2, (5*a+b)//2 if n%2 == 0: return a*b else: a,b = a*b,last*4-2+b*b return (a+b)//2

Of course, we don’t have to use Lucas numbers, a pleasant recurrence for Fibonacci numbers alone is:

`F`

_{n+m} = F_{n+1}F_{m}+F_{n}F_{m-1}

which gives us:

`F`

_{2n} = F_{n}F_{n}+2F_{n}F_{n-1}

F_{2n-1} = F_{n}F_{n}+F_{n-1}F_{n-1}

and

`F`

_{2n+1} = F_{n+1}F_{n+1}+F_{n}F_{n}

F_{2n} = 2F_{n}F_{n+1}-F_{n}F_{n}

leading to:

def fibaux4(n): if n == 0: return 0,1 elif n == 1: return 1,0 else: a,b = fibaux4((n+1)//2) if (n%2 == 0): return a*(a+2*b),a*a+b*b else: return a*a+b*b,b*(2*a-b) def fib4(n): return fibaux4(n)[0]

with an elegant symmetry between the odd and even cases. This seems to be solution suggested by Dijkstra in http://www.cs.utexas.edu/~EWD/ewd06xx/EWD654.PDF.

There is, of course more: notably the matrix based solutions – we observe that `(a`

is _{n+2},a_{n+1)}`M(a`

for 2×2 matrix _{n+1},a_{n})`M = (1 1 1 0)`

. We then have `(F`

and we can use conventional matrix operations, including inversion, to compute _{n+1},F_{n}) = M^{n}(1 0)`F`

. We can apply this to any linear recurrence, eg. _{n}`a`

corresponds to the matrix _{n+2} = Aa_{n+1}+Ba_{n}`(A B 1 0)`

.

It’s notable that `M`

only depends on the recurrence and not on the initial values, so we can compute, for example, the Lucas numbers with `(L`

, and there is a regularity to the structure of M that can be used to optimise the computation, eg. for Fibonacci, M is of the form _{n+1},L_{n}) = M^{n}(1 2)`(a+b a a b)`

(cf. HAKMEM: http://www.inwap.com/pdp10/hbaker/hakmem/recurrence.html). But that probably needs another post.