Fun with TUN

TUN devices are much used for virtualization, VPNs, network testing programs, etc. A TUN device essentially is a network interface that also exists as a user space file descriptor, data sent to the interface can be read from the file descriptor, and data written to the file descriptor emerges from the network interface.

Here’s a simple example of their use. We create a TUN device that simulates an entire network, with traffic to each network address just routed back to the original host.

For a complete program, see:

https://github.com/matthewarcus/stuff/blob/master/tun/reflect.cpp

First create your TUN device, this is fairly standard, most public code seems to be derived from Maxim Krasnyansky’s:

https://www.kernel.org/doc/Documentation/networking/tuntap.txt

and our code is no different:

int tun_alloc(char *dev) 
{
  assert(dev != NULL);
  int fd = open("/dev/net/tun", O_RDWR);
  CHECKFD(fd);

  struct ifreq ifr; 
  memset(&ifr, 0, sizeof(ifr)); 
  ifr.ifr_flags = IFF_TUN | IFF_NO_PI;
  strncpy(ifr.ifr_name, dev, IFNAMSIZ); 
  CHECKSYS(ioctl(fd, TUNSETIFF, (void *) &ifr));
  strncpy(dev, ifr.ifr_name, IFNAMSIZ); 
  return fd;
}

We want a TUN device (rather than TAP, essentially the same thing but at the ethernet level) and we don’t want packet information at the moment. We copy the name of the allocated device to the char array given as a parameter.

Now all our program needs to do is create the TUN device and sit in a loop copying packets:

int main(int argc, char *argv[])
{
  char dev[IFNAMSIZ+1];
  memset(dev,0,sizeof(dev));
  if (argc > 1) strncpy(dev,argv[1],sizeof(dev)-1);

  // Allocate the tun device
  int fd = tun_alloc(dev);
  if (fd < 0) exit(0);

  uint8_t buf[2048];
  while(true) {
    // Sit in a loop, read a packet from fd, reflect
    // addresses and write back to fd.
    ssize_t nread = read(fd,buf,sizeof(buf));
    CHECK(nread >= 0);
    if (nread == 0) break;
    reflect(buf,nread);
    ssize_t nwrite = write(fd,buf,nread);
    CHECK(nwrite == nread);
  }
}

The TUN mechanism ensures that we get exactly one packet for each read, we don’t need to worry about fragmentation, and we just send each packet back with the source and destination IPs swapped:

static inline void put32(uint8_t *p, size_t offset, uint32_t n)
{
  memcpy(p+offset,&n,sizeof(n));
}

static inline uint32_t get32(uint8_t *p, size_t offset)
{
  uint32_t n;
  memcpy(&n,p+offset,sizeof(n));
  return n;
}

void reflect(uint8_t *p, size_t nbytes)
{
  (void)nbytes;
  uint8_t version = p[0] >> 4;
  switch (version) {
  case 4:
    break;
  case 6:
    fprintf(stderr, "IPv6 not implemented yet\n");
    exit(0);
  default:
    fprintf(stderr, "Unknown protocol %u\n", version);
    exit(0);
  }
  uint32_t src = get32(p,12);
  uint32_t dst = get32(p,16);
  put32(p,12,dst);
  put32(p,16,src);
}

We don’t need to recalculate the header checksum as it doesn’t get changed by just swapping two 32 bit segments.

Handling IPV6 is left as an exercise for the reader (we just need to use a different offset and address size I think).

In this day and age, security should be prominent in our minds, particularly for long-running programs like our TUN server, so for extra points, let’s add in some capability processing.

(You might need to install a libcap-dev package for this to work, for example, with “sudo apt-get install libcap-dev” and link with -lcap).

Once we have started up, we should check if we have the required capability, we just require CAP_NET_ADMIN to be permitted:

  cap_t caps = cap_get_proc();
  CHECK(caps != NULL);

  cap_value_t cap = CAP_NET_ADMIN;
  const char *capname = STRING(CAP_NET_ADMIN);

  cap_flag_value_t cap_permitted;
  CHECKSYS(cap_get_flag(caps, cap,
                        CAP_PERMITTED, &cap_permitted));
  if (!cap_permitted) {
    fprintf(stderr, "%s not permitted, exiting\n", capname);
    exit(0);
  }

and then make effective what we require:

  CHECKSYS(cap_clear(caps));
  CHECKSYS(cap_set_flag(caps, CAP_PERMITTED, 1, &cap, CAP_SET));
  CHECKSYS(cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap, CAP_SET));
  CHECKSYS(cap_set_proc(caps));

Finally, after creating our TUN object, before entering our main loop, we can relinquish our extra privileges altogether:

  CHECKSYS(cap_clear(caps));
  CHECKSYS(cap_set_proc(caps));
  CHECKSYS(cap_free(caps));

For completeness, here are the error checking macros used above:

#define CHECKAUX(e,s)                            \
 ((e)? \
  (void)0: \
  (fprintf(stderr, "'%s' failed at %s:%d - %s\n", \
           s, __FILE__, __LINE__,strerror(errno)), \
   exit(0)))
#define CHECK(e) (CHECKAUX(e,#e))
#define CHECKSYS(e) (CHECKAUX((e)==0,#e))
#define CHECKFD(e) (CHECKAUX((e)>=0,#e))
#define STRING(e) #e

Of course, production code will want to do something more sophisticated than calling exit(0) when an error occurs…

To use, compile for example with:

g++ -W -Wall -O3 reflect.cpp -lcap -o reflect

We can set permissions for our new executable to include the relevant capability, so we don’t need to start it as root:

$ sudo setcap cap_net_admin+ep ./reflect

Actually start it:

$ ./reflect&
Capability CAP_NET_ADMIN: 1 0 1
Created tun device tun0

We now have an interface, but it isn’t configured:

$ ifconfig tun0
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
POINTOPOINT NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

With the interface running, set up networking:

$ sudo ip link set tun0 up
$ sudo ip addr add 10.0.0.1/8 dev tun0

Check all is well:

$ ifconfig tun0
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.0.1 P-t-P:10.0.0.1 Mask:255.0.0.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

And try it out:

$ ping -c 1 10.0.0.41
PING 10.0.0.41 (10.0.0.41) 56(84) bytes of data.
64 bytes from 10.0.0.41: icmp_req=1 ttl=64 time=0.052 ms

--- 10.0.0.41 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.052/0.052/0.052/0.000 ms

Let’s check performance, firstly, a flood ping on the loopback device:

$ sudo ping -f -c10000 -s1500 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 1500(1528) bytes of data.

--- 127.0.0.1 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 778ms
rtt min/avg/max/mdev = 0.003/0.006/0.044/0.002 ms, pipe 2, ipg/ewma 0.077/0.006 ms

compared to one through the TUN connection:

$ sudo ping -f -c10000 -s1500 10.0.0.100
PING 10.0.0.100 (10.0.0.100) 1500(1528) bytes of data.

--- 10.0.0.100 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 945ms
rtt min/avg/max/mdev = 0.022/0.032/3.775/0.038 ms, pipe 2, ipg/ewma 0.094/0.032 ms

Respectable. We have got ourselves a network!

Advertisements

Testing for Triangularity

A company I used to work for had a standard interview problem of writing a short function that would take 3 numbers (usually assumed to be integers) and determine if they formed a valid triangle. There was much discussion about the best/fastest/clearest/most elegant way of doing this, but I don’t remember the topic of what to do if the numbers were floating point coming up.

Curiously, being interviewed myself at another company not long ago I was asked this very question and wasn’t quite sure what the best thing to do was – for integers, my preferred solution is to sort the 3 numbers as a >= b >= c, then we have a triangle just when c > 0 and a < b+c (or
a-b < c if we are worried about overflow). What about if they are floating point numbers? Clearly if b >> c, then b+c might lose precision, so, for example, we might have c > 0 and b = a, so it’s valid, but if b+c = b, then our test fails.

We could try the same trick as for integers, testing for a-b < c – subtracting the two largest should minimize the rounding error; this seems plausible, but can we be sure we won’t get it wrong?

An earlier post looked at the sequence of integers:

0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 18 20 22 24 26 28 30
32 36 40 44 48 52 56 60
64 72 80 88 96 104 112 120
...

which are really just thinly disguised (non-negative) floating point numbers. An important feature, obvious here and easy to prove of “real” FP numbers, is that, apart from the first two rows, each row is the double of the row above it, therefore, 2n is representable exactly iff n is. Furthermore, it’s pretty obvious that if a and b are representable, with a/2 <= b <= a, then a-b is representable (and if a and b are numbers in the first two rows, then all differences are representable, so we don’t need to worry about them).

Now let’s return to our triangle: we have a >= b >= c > 0 and clearly a,b,c must be representable. If all the numbers are in the first two rows, then all operations are exact and we get the right answer. Otherwise, a is even: first, consider the case b >= a/2, as above, this implies that a-b is representable and the comparison with c must be exact, so our test is correct. Otherwise, if b < a/2, then since c <= b, c + b <= 2b < a, so we don’t have a triangle, and, assuming that rounding is monotonic (ie. x <= y => round(x) <= round(y), or equivalently, x < y => round(x) <= round(y)), our check works out correct too:

b < a/2, so b+c < a (exact) so c < a-b (exact) so c = round(c) <= round(a-b)

QED

As usual for this blog, we are merely paddling at the shore of a much deeper sea. For much more on triangles, floating point approximations etc, see Kahan’s paper:

http://wwwp.cs.unc.edu/~snoeyink/c/c205/Triangle.pdf

(or indeed anything else by the great man:

http://www.cs.berkeley.edu/~wkahan/

http://owpdb.mfo.de/detail?photo_id=5350).


Subbag Enumeration

A bag or multiset is just a map from some base set X to the natural numbers, so we can represent such a bag by a sequence of (non-negative) integers. Subbags are defined in the obvious way (and if s is subbag of t, t is a superbag of s). Given a subbag s of m, we can find the next subbag (in reverse lexicographic order) with this function:

bool nextbag(vector<int> &s, 
             const vector<int> &m,
             size_t start = 0)
{
  for (size_t i = start; i < s.size(); i++) {
    if (s[i] == m[i]) {
      s[i] = 0;
    } else {
      s[i]++;
      return true;
    }
  }
  return false;
}

This sets the subsequence [start..] of s to the (reverse) lexicographically next sequence. m gives maximum values for each field of s and we return false if we have cycled back round to the zero sequence. Note that if m[i] < 0 then effectively there is no maximum value for each element.

We might also want to skip from s to the next subbag that is not a superbag of s. This is accomplished by:

bool nextnonsuper(vector<int> &s, const vector<int> &m)
{
  for (size_t i = 0; i < s.size(); i++) {
    if (s[i] != 0) {
      s[i] = 0;
      return nextbag(s,m,i+1);
    }
  }
  return false;
}

Here we see why we wanted the start parameter in nextbag. For example, we go from:

[0,0,0,3,4,...] => [0,0,0,0,5,...]

Given f: bag -> A where A is an ordered set and where f is monotonic (if s is a subbag of t, then f(s) <= f(t)), if we want to find all (sub)bags where f(s) <= x, we can use nextnonsuper to skip some of the enumeration:

while (true) {
  if (f(s) >= t) s = nextnonsuper(s)
  else s = nextbag(s)
  ...
}

For an example, we can use a weighted sum over the bag elements (we will assume that the weights are positive).

int weights(const vector<int> &s, const vector<int> &w)
{
  int total = 0;
  for (size_t i = 0; i < s.size(); i++) {
    total += s[i]*w[i];
  }
  return total;
}

and we can add some boiler plate to make the whole thing into a nice standalone application:

// Print out our vector, include weights if required
// Optionally print in reverse order
void print(const vector<int> &s, 
           const vector<int> *w = NULL, 
           bool printreverse = false)
{
  int sz = s.size();
  for (int i = 0; i < sz; i++) {
    int index = printreverse?sz-i-1:i;
    cout << s[index];
    if (w != NULL) cout << ":" << (*w)[index];
    cout << (i < sz-1?" ":"\n");
  }
}

int main(int argc, char *argv[])
{
  int target;    // The sum we are aiming for
  vector<int> s; // Where we build the sequence
  vector<int> m; // The number of each items available
  vector<int> w; // The weight of each item
  bool printreverse = false; // Print sequences in reverse
  bool printweight  = false; // Print the weights as well
  bool printless    = false; // Print sequences less than target
  bool printgreater = false; // Print those greater than target

  // Crude but effective argument processing
  while (argv[1][0] == '-') {
    if (strcmp(argv[1],"-r") == 0) {
      printreverse = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-l") == 0) {
      printless = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-g") == 0) {
      printgreater = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-w") == 0) {
      printweight = true;
      argv++; argc--;
    } else {
      cerr << "Invalid flag: " << argv[1] << endl;
      exit(0);
    }
  }
  // Read target
  // target = 0 means no target
  if (sscanf(argv[1],"%d",&target) != 1) {
    cerr << "Invalid target: " << argv[1] << endl;
    exit(0);
  }
  // Read count/weight pairs
  for (int i = 2; i < argc; i++) {
    int a,b;
    if (sscanf(argv[i],"%d:%d",&a,&b) != 2) {
      cerr << "Invalid arg: " << argv[i] << endl;
      exit(0);
    }
    m.push_back(a);
    w.push_back(b);
  }
  // Initialize result sequence.
  s.resize(m.size());
  while(true) {
    int total = weights(s,w);
    // Print result if desired
    if (target == 0 || 
        total == target || 
        (total < target && printless) ||
        (total > target && printgreater)) {
      print(s, printweight?&w:NULL, printreverse);
    }
    if (target > 0 && total >= target) {
      // Have hit or exceeded target, so skip superbags
      if (!nextnonsuper(s,m)) break;
    } else {
      if (!nextbag(s,m)) break;
    }
  }
}

Compiling all this as bagenum.cpp, we can now do some enumerations. All the ways to make 5 from the bag {1,1,1,2,2,3}:

$ ./bagenum 5 3:1 2:2 1:3
3 1 0
1 2 0
2 0 1
0 1 1

We can specify a target of 0 to get all the subbags of the input, and the -w flag prints the weight out as well:

$ ./bagenum -w 0 3:1 2:2
0:1 0:2
1:1 0:2
2:1 0:2
3:1 0:2
0:1 1:2
1:1 1:2
2:1 1:2
3:1 1:2
0:1 2:2
1:1 2:2
2:1 2:2
3:1 2:2

We can enumerate bitstrings with a given number of bits set

$ ./bagenum 2 1:1 1:1 1:1 1:1
1 1 0 0
1 0 1 0
0 1 1 0
1 0 0 1
0 1 0 1
0 0 1 1

Or with at most a given number set:

$ ./bagenum -l 2 1:1 1:1 1:1 1:1
0 0 0 0
1 0 0 0
0 1 0 0
1 1 0 0
0 0 1 0
1 0 1 0
0 1 1 0
0 0 0 1
1 0 0 1
0 1 0 1
0 0 1 1

Or the ways of rolling 4 with 3 dice numbered 0-5 – much the same as rolling 7 with 3 normal dice (the -r parameter prints the results in reverse order, ie. normal lexicographic order):

$ ./bagenum -r 4 5:1 5:1 5:1
0 0 4
0 1 3
0 2 2
0 3 1
0 4 0
1 0 3
1 1 2
1 2 1
1 3 0
2 0 2
2 1 1
2 2 0
3 0 1
3 1 0
4 0 0

All this works quite nicely with unlimited multiplicities, provided we also set an upper bound, so we can enumerate the solutions to the classic Polya problem of the ways to make 50c with usual US coins (50 as it happens):

$ ./bagenum -w 50 -1:1 -1:5 -1:10 -1:25 -1:50
50:1 0:5 0:10 0:25 0:50
45:1 1:5 0:10 0:25 0:50
40:1 2:5 0:10 0:25 0:50
35:1 3:5 0:10 0:25 0:50
30:1 4:5 0:10 0:25 0:50
25:1 5:5 0:10 0:25 0:50
20:1 6:5 0:10 0:25 0:50
15:1 7:5 0:10 0:25 0:50
10:1 8:5 0:10 0:25 0:50
5:1 9:5 0:10 0:25 0:50
0:1 10:5 0:10 0:25 0:50
40:1 0:5 1:10 0:25 0:50
35:1 1:5 1:10 0:25 0:50
...

Finally, the -g flag prints the subbags found that are greater than the target – but only the ones all of whose subbags are less than the target (if you see what I mean):

./bagenum -g -w 8 1:2 1:3 1:4 1:5
1:2 1:3 1:4 0:5
0:2 1:3 0:4 1:5
0:2 0:3 1:4 1:5

(So the complement of subbags are the ones whose sum is less than or equal to a target but every superbag is greater, as in the classic knapsack problem).

Once again, this works fine with unlimited multiplicities:

$ ./bagenum -g -w 8 -1:2 -1:3 -1:4 -1:5
4:2 0:3 0:4 0:5
3:2 1:3 0:4 0:5
1:2 2:3 0:4 0:5
0:2 3:3 0:4 0:5
2:2 0:3 1:4 0:5
1:2 1:3 1:4 0:5
0:2 2:3 1:4 0:5
0:2 0:3 2:4 0:5
2:2 0:3 0:4 1:5
0:2 1:3 0:4 1:5
0:2 0:3 1:4 1:5
0:2 0:3 0:4 2:5

All in all, a nice little combinatorial Swiss Army knife and useful when trying to understand the stuff in:

http://www.amazon.co.uk/The-Computer-Programming-Volume-Fascicle/dp/0201853949/

for example (I like the TAOCP fascicles, they are much more convenient to read in the bar, in the bath or on the bus). This one has lots more about this sort of problem: special cases with more efficient solutions, as well as more general algorithms for more general combinatorial problems.


Tail recursive string recognizers

A recent discussion at work about the best way to write a simple string checking function set me thinking…

Essentially, the problem is to recognize strings conforming to some regular expression, but with some extra conditions, that either can’t be expressed as a regexp or that are inconvenient to do so, so we can’t just plug it in to PCRE or whatever RE library is flavour of the month.

In this case, we wanted strings conforming to:

[0-9a-zA-Z*][-0-9a-zA-Z*]*(.[0-9a-zA-Z*][-0-9a-zA-Z*]*)*

ie. a path containing “.” separated components, possibly with “*” wildcards, but also with the condition that each component has at most one wildcard and with a limit on the length of each component.

Ignoring the extra conditions, we can define a suitable regular grammar:

S = S0
S0 = ALNUM + S1 | "*" + S1
S1 = ALNUM + S1 | "-" + S1 | "*" + S1 | "." + S0 | ""

and this naturally suggests a recursive descent recognizer, though in this case, there is no actual descent as the grammar is regular and all the recursions are tail calls.

Happily, these days compilers like GCC will compile tail calls to jumps (at least on higher optimization settings), just like ML compilers, for example, have always had to, and we can hope for reasonably good code from such an approach. Of course, once the tail calls have been optimized, we have a finite state recognizer with the state transitions implemented as gotos!

The extra conditions can now be easily added in to the state transition functions, with any variables needed just appearing as extra parameters (and because we need to specify the parameter value for every recursive call/state transition, we are less likely to forget to set something) so we end up with something like:

#include <ctype.h>

static inline bool check(const char *p);

static bool check1(const char *p, int n, bool seen) {
  if (n > 4) return false;
  if (isalnum(*p)) return check1(p+1, n+1, seen);
  if (*p == '-') return check1(p+1, n+1, seen);
  if (*p == '*') return !seen && check1(p+1,n+1,true);
  if (*p == '.') return check(p+1);
  if (*p == '\0') return true;
  return false;
}

static bool check(const char *p) {
  if (isalnum(*p)) return check1(p+1,1,false);
  if (*p == '*') return check1(p+1,1,true);
  return false;
}

Compiling with 64 bit gcc 4.7.2, -O2 (gcc only optimizes tail calls at -O2 and above), we get:

_Z5checkPKc:
	movq	%rbp, -8(%rsp)
	movq	%rbx, -16(%rsp)
	subq	$24, %rsp
	movzbl	(%rdi), %ebx
	movq	%rdi, %rbp
	call	__ctype_b_loc
	movq	(%rax), %rax
	movsbq	%bl, %rdx
	testb	$8, (%rax,%rdx,2)
	jne	.L7
	cmpb	$42, %bl
	je	.L8
	xorl	%eax, %eax
	movq	8(%rsp), %rbx
	movq	16(%rsp), %rbp
	addq	$24, %rsp
	ret
.L7:
	leaq	1(%rbp), %rdi
	xorl	%edx, %edx
.L5:
	movq	8(%rsp), %rbx
	movq	16(%rsp), %rbp
	movl	$1, %esi
	addq	$24, %rsp
	jmp	_ZL6check1PKcib
.L8:
	leaq	1(%rbp), %rdi
	movl	$1, %edx
	jmp	.L5
_ZL6check1PKcib:
	movq	%rbp, -24(%rsp)
	movq	%r12, -16(%rsp)
	movq	%rdi, %rbp
	movq	%r13, -8(%rsp)
	movq	%rbx, -32(%rsp)
	subq	$40, %rsp
	movzbl	(%rdi), %ebx
	movl	%esi, %r13d
	movl	%edx, %r12d
	call	__ctype_b_loc
	movq	(%rax), %rax
	movsbq	%bl, %rcx
	testb	$8, (%rax,%rcx,2)
	jne	.L23
	cmpb	$45, %bl
	je	.L23
	cmpb	$42, %bl
	je	.L24
	testb	%bl, %bl
	sete	%al
	cmpb	$46, %bl
	je	.L25
.L12:
	movq	8(%rsp), %rbx
	movq	16(%rsp), %rbp
	movq	24(%rsp), %r12
	movq	32(%rsp), %r13
	addq	$40, %rsp
	ret
.L23:
	leal	1(%r13), %esi
	cmpl	$4, %esi
	jg	.L14
	leaq	1(%rbp), %rdi
	movzbl	%r12b, %edx
.L21:
	movq	8(%rsp), %rbx
	movq	16(%rsp), %rbp
	movq	24(%rsp), %r12
	movq	32(%rsp), %r13
	addq	$40, %rsp
	jmp	_ZL6check1PKcib
.L24:
	xorl	%eax, %eax
	testb	%r12b, %r12b
	jne	.L12
	leal	1(%r13), %esi
	cmpl	$4, %esi
	jg	.L12
	leaq	1(%rbp), %rdi
	movl	$1, %edx
	jmp	.L21
.L14:
	xorl	%eax, %eax
	jmp	.L12
.L25:
	leaq	1(%rbp), %rdi
	movq	8(%rsp), %rbx
	movq	16(%rsp), %rbp
	movq	24(%rsp), %r12
	movq	32(%rsp), %r13
	addq	$40, %rsp
	jmp	_Z5checkPKc

It looks like the tail calls have indeed been compiled as jumps, so the stack won’t blow up. There is what looks like some unnecessary movement of function arguments between the stack and the function argument registers, but we’re not doing too badly. Probably not quite as fast as a hand-coded goto-based version, but only a little slower, almost certainly a price worth paying for the gain in clarity (in fact, most of the time seems to be spent in the calls to isalnum).


Finally, a new laptop

It was high time I got a new laptop, the old Dell Inspiron 2200 had done quite well over the years, but was starting to fall apart and increasingly was just not up to the demands placed on it. I spent a while mulling over the options (don’t need too much raw power, but want some reasonable features), and ended up going for an Acer v5-171, this is the entry level model with the Core i3 processor, some sort of Ivy Bridge, whatever these silly names mean, more than adequate for my needs – doesn’t have the AES encryption I’d get with an i5, but I can probably live without that, and it doesn’t have Turbo Boost, but I don’t even know what that is. What it does have is a 0.5TB disk, 6GB of memory, and is clocked at 1.9GHz. There are 3 USB ports, one is USB 3.0, and a dinky little card reader on the front that I didn’t notice for a while. The keyboard is small, but quite usable, in fact it’s only a little narrower than the keyboard on my old Inspiron. It’s got those modern, flat keys that I didn’t think I liked, but actually, it’s quite pleasant to type on.

Here’s what Intel say about the CPU:

http://ark.intel.com/products/72057/Intel-Core-i3-3227U-Processor-3M-Cache-up-to-1_90-GHz

And what PCWorld say about the laptop. I wasn’t swayed by the “As Advertised on TV”, I hardly ever watch it, and in fact, the spec is better than what is describe here:

http://www.pcworld.co.uk/gbuk/acer-aspire-v5-171-11-6-laptop-silver-19414897-pdt.html#longDesc

The main alternative was the Asus S200 with a touch screen, though I hate fingerprints, and with either a rather inferior processor or a rather superior price.

I was intending to have a dual booting system with the Windows 8 it came preinstalled with (I bought it for the handsome price of £329 from the local PC World, who tried to hit me for an extra £30 for their setup service, I explained that I really didn’t want or need that, or the extended warranty, or a copy of Office, just the computer please… The helpful assistant Maya was very pleasant about all this and worked out that she should tick the “No Marketing” box without having to ask) but it didn’t take long to give up on that idea; I can’t say my quick sojourn round Windows 8 filled me with much joy, and I’ve been happily using Linux (and Android) for everything for some time now, so after spending 10 mins or so failing to work out how to set up UEFI dual booting (or even how to boot off the Live USB stick in UEFI mode), I flipped over to BIOS Legacy Mode on and told Ubuntu (12.10) to wipe the disk with extreme prejudice. I used to like playing around with partitioners, mulling over how big my swap partitition should be, or if I should keep system files and user data separate, these days I just let the installer get on with it.

After that, pretty much everything just went through just as it should. A deja vu moment of having to add “acpi_backlight=vendor” to the boot options to get the brightness control to work (one day I’ll find out what it means), but apart from that, everything works just fine, straight out of the box.

The screen is a little smaller (physically) than I’m used to, everything these days seems to be 1366×768, but in this case it’s all within a 10.1″ display, so the detail is lovely and crisp, but some things are a bit small for my poor old eyes so some investigation of accessibility options is called for (I recommend the NoSquint Firefox extension). There is a pixel stuck at red (played with lcdnurse to no avail) and the touchpad is a bit sticky (seems better after fiddling with the sensitivity). The sound is pretty poor (though we now seem to have a Spinal Tap-style volume control that goes way beyond 100%, which helps a little). Haven’t tried headphones yet.

Assuming the Power Statistics tool can be trusted, it uses 4W with screen off, 6W with minimum visible brightness level, going up to 8W for a normal level and nearly 10 on maximum. Heavy graphics & CPU use pushes things up to 16 or 17W. Wifi doesn’t seem to make much difference. The battery is reckoned to be 36Wh, so that should be a little over 4 hours, not much by modern standards, but enough for my modest needs. Not too much of a heat problem, the base of the laptop is a little warm but nothing too uncomfortable.

Out of curiosity, I investigated UEFI a bit more: to fiddle with any of the secure boot options in the BIOS you have to set the “Supervisor Password” it seems (I suppose that should have been obvious – initially, it’s the only modifiable field on the secure boot screen). Having done this, in the same BIOS screen, one can select the .efi files from the Ubuntu installation USB stick to be bootable from (/EFI/BOOT/ seems to be required directory) and then, mirabile dictu, we can boot into Live USB installation. I tried “Boot Repair” but got a warning about not having an EFI partition and would I like to add one (presumably if I had done the original installation in UEFI mode, this would have been created for me, I’m not sure I can really be bothered right now though).

Resetting to Legacy Mode and rebooting, we get back to the original installation and all is still well (it always amazes me when things still work after fiddling around, from the time I pulled the CPU out of my Sinclair Spectrum and put it back, and it Still Worked – I had less appreciation then of how easy it is to break the pins on DIL ICs). Only thing not working is that stuck pixel (not visible with light colours though) and though Bluetooth looks like its working, it won’t detect my phone. I wonder if this is related to this in dmesg:

[ 0.952921] WARNING: at /build/buildd/linux-3.5.0/arch/x86/mm/ioremap.c:102 __ioremap_caller+0x335/0x380()
[ 0.952925] Hardware name: V5-171
[ 0.952928] Modules linked in:
[ 0.952932] Pid: 1, comm: swapper/0 Not tainted 3.5.0-17-generic #28-Ubuntu
[ 0.952935] Call Trace:
[ 0.952943] [] warn_slowpath_common+0x7f/0xc0
[ 0.952948] [] warn_slowpath_null+0x1a/0x20
[ 0.952952] [] __ioremap_caller+0x335/0x380
[ 0.952958] [] ? bgrt_init+0x47/0x126
[ 0.952964] [] ? acpi_get_table_with_size+0x5f/0xbe
[ 0.952969] [] ? acpi_hed_init+0x30/0x30
[ 0.952974] [] ioremap_nocache+0x17/0x20
[ 0.952978] [] bgrt_init+0x47/0x126
[ 0.952984] [] do_one_initcall+0x12a/0x180
[ 0.952990] [] kernel_init+0x140/0x1c9
[ 0.952995] [] ? loglevel+0x31/0x31
[ 0.953000] [] kernel_thread_helper+0x4/0x10
[ 0.953005] [] ? start_kernel+0x3c4/0x3c4
[ 0.953009] [] ? gs_change+0x13/0x13
[ 0.953015] ---[ end trace d285ec97245f6911 ]---
[ 0.953064] GHES: HEST is not enabled!

That last one sounds like one of those cryptic lovers notes that used to be in the classified ads of The Times.

I love boot messages:

[ 9.573227] cfg80211: Calling CRDA to update world regulatory domain
# I expect they were staying up late, waiting for that call.
[ 9.574528] pci 0000:00:00.0: >detected 131072K stolen memory
# I thought stealing RAM had gone out of fashion, now it's so cheap
[ 9.980692] init: failsafe main process (758) killed by TERM signal
# Doesn't sound very failsafe

Enough lame attempts at computer humour, back to the laptop. Being an old stick in the mud, this is my first time with Ubuntu 12.x with Unity and all that. I’m not convinced I want my laptop to look like a phone, but I’m prepared to go with it for the moment and give it my best shot, though it’s tempting to just find whatever the modern equivalent of TWM is and use that. There are some nice features for sure, it seems that now Thunderbird runs in the background because on rebooting I find an envelope icon that it turns out is telling me that Gaylord Madrid is now fully funded: go Gaylord, go (if you’ve got a few bob to spare, I recommend parking it with http://www.lendwithcare.org). And I’ve just discovered that the Windows key (which Linux seems to be calling Super – it would be nice to have a real Space Cadet keyboard of course) actually does something useful. The trackpad of the Acer is a bit of weak point, so making more use of keyboard shortcuts could be a good idea – as a long-time Emacs user, I really ought to get used to using Ctrl-T in Firefox to start up a new tab, for example.

So, pretty happy all in all, a nice bit of kit, some shortcomings, but we aren’t exactly high end here so that’s to be expected. Next time I’ll know what to do with UEFI, but going single boot from the start has got to be the right thing.

For the record:

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Stepping: 9
CPU MHz: 779.000
BogoMIPS: 3791.51
Virtualisation: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
stepping : 9
microcode : 0x15
cpu MHz : 779.000
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 3791.51
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

$ glxinfo | grep -i opengl
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Ivybridge Mobile
OpenGL version string: 3.0 Mesa 9.0.2
OpenGL shading language version string: 1.30

$ lspci
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
03:00.0 Network controller: Atheros Communications Inc. AR9462 Wireless Network Adapter (rev 01)
04:00.0 Ethernet controller: Broadcom Corporation NetLink BCM57785 Gigabit Ethernet PCIe (rev 10)
04:00.1 SD Host controller: Broadcom Corporation NetXtreme BCM57765 Memory Card Reader (rev 10)

$ lsusb
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 064e:e330 Suyin Corp.


Movies from OpenGL

I’ve got a few natty programs that draw stuff with OpenGL, I’d like to be able to record their output in some appropriate video format that I can show my friends, post on YouTube etc. Not quite sure how to do that, but it can’t be that difficult I’d have thought.

As as experiment, I’ll write it all up as I go along.

It seems that I can use the data in the frame buffer directly with glReadPixels:

http://www.opengl.org/sdk/docs/man/xhtml/glReadPixels.xml

so it should be easy to dump out frames as raw RGB values or whatever is required by the next step, ie. encoding to the actual movie format, AVI, mpeg4, whatever.

ffmpeg seems to be the tool for doing this. Let’s try some experiments: we can dump from X:

ffmpeg -f x11grab -s cif -r 25 -i :0.0+10,20 /tmp/out.mpg

but that has horrible encoding artefacts, trying with:

ffmpeg -qscale 5 -f x11grab -s cif -r 25 -i :0.0 /tmp/out.mpg

does rather better (qscale it seems goes from 1 to 31, lower is higher quality).

Apparently, ffmpeg is ‘deprecated’ and we should use avconf, but that seemed to fail dismally, it looked like it wasn’t synchronized with the frame buffer refreshes or something, I get all sorts of purple streaks and flashes, not nice.

avconv -f x11grab -s cif -r 25 -i :0.0 /tmp/out.mpg

maybe there’s an option for that too, but no idea what it might be.

Sticking to ffmpeg for the moment, after the usual Googling, this seems useful:

http://stackoverflow.com/questions/7238013/rawvideo-and-rgb32-values-passed-to-ffmpeg

Looks like something like:

ffmpeg -vcodec rawvideo -f rawvideo -pix_fmt rgb32 -s256x256 -i pipe:0 -f avi -vcodec foo.avi

might work.

I’ve got some raw video files around from my MMPS project: if you take a PPM type 6 file, and take off the header lines you end up with raw bytes, packed as 24 bit RGB values.

A bit of experimentation and I find all that is needed to make a rather boring video of the earth, contrary to Galileo, not moving, is:

(while true; do cat earth.raw; done) | ffmpeg -qscale 1 -f rawvideo -pix_fmt rgb24 -s 1000x500 -i pipe:0 foo7.avi

ffmpeg seems to work out that we need an avi wrapper for an avi file, so we don’t need -f avi and it uses its own mpeg4 encoder by default, so we don’t need -vcodec mpeg4, and the -f rawvideo covers -vcodec rawvideo. And another thing that I now know, but not knowing held me up a little: making an AVI file like this but with just one frame doesn’t work very well (at least Linux Movie Player won’t play the result).

Back to OpenGL, now I’ve got the video coding back end sorted I need to generate some real data. Since I’ll be piping the data into ffmpeg, I just need to dump out the frames to stdout, in the appropriate format. The OpenGL doc for the various formats is, as usual fairly baffling for a newbie like myself but this seems to work:

glFlush();
static size_t nbytes = 3*screen_width*screen_height;
static char* framedata = new char[nbytes];
glReadPixels(0,0,screen_width,screen_height,GL_RGB,GL_UNSIGNED_BYTE,framedata);
fwrite(framedata,1,nbytes,stdout);
glutSwapBuffers();

I’m not going to put this forward as model code: static variables, yuk, and there’s no protection against some damn fool resizing the window while we are ‘recording’, which won’t make anything very happy, and I’m not even checking the return value of fwrite or for errors in glReadPixels. We could do things entirely offline, but we might want to record some interactions and writing out each frame doesn’t seem to slow things down too much.

./poly2 1 1 1 1 | ffmpeg -qscale 1 -vcodec rawvideo -f rawvideo -pix_fmt rgb24 -s 800x640 -i pipe:0 poly.avi

I probably ought to change poly2 so it takes width and height on the command line rather than hardwired (but recompilation is so fast these days).

Next step, uploading to YouTube. Looking at:

http://support.google.com/youtube/bin/answer.py?hl=en&answer=1722171&topic=2888648&ctx=topic

it seems that I probably want 480p format, 854×480 as a sort of baseline. I still don’t understand about codecs, but looks like H.264 is the way to go. Now, what encoder should I use, looks like x264 or libx264 is the thing, but nothing is installed:

$ ffmpeg -codecs | grep 264
D V D h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
D V D h264_vdpau H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (VDPAU acceleration)

It seems I can decode, but not encode.

The rather helpful:

https://help.ubuntu.com/community/FFmpeg

suggests installing these extra packages:

sudo apt-get install libavcodec-extra-53 libavdevice-extra-53 libavfilter-extra-2 libavformat-extra-53 libavutil-extra-51 libpostproc-extra-52 libswscale-extra-2

and now I have:

$ ffmpeg -codecs | grep 264
D V D h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
D V D h264_vdpau H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (VDPAU acceleration)
EV libx264 libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10

Nice, and now:

./poly2 1 1 1 1 | ffmpeg -qscale 1 -vcodec rawvideo -f rawvideo -pix_fmt rgb24 -s 800x640 -i pipe:0 -vcodec libx264 foo3.avi

seems to do the trick.

Looking good, let’s try the YouTube recommended 854×480 dimensions; hmm, we’ve gone monochrome with rather prominent scan lines (reminded me of the old Dr Who tent at Glastonbury, hands up if anyone remembers that):

Screenshot from 2013-03-10 16:49:55

Looks like a data alignment problem – I’ve still got magic numbers in the code, but they seem to be the right ones – but checking OpenGL documentation, I find:

http://www.opengl.org/sdk/docs/man/xhtml/glPixelStore.xml

Looks like OpenGL is word-aligning the pixel rows, fine for 800 pixels, not so good for 854.

glPixelStorei(GL_PACK_ALIGNMENT,1)

should sort this out (note the ‘i’ for ‘integer’ on the end there), and indeed it does:

./poly2 1 1 1 1 | ffmpeg -f rawvideo -pix_fmt rgb24 -s 854x480 -i pipe:0 -vcodec libx264 foo264.avi

makes me a nice 20MB, 1 minute 17 sec video of some coloured balls writhing around in empty space. A nice background (and one that disguises the stuck pixel on my new laptop that I only notice watching this sort of thing) would be good.

Now for that YouTube account, I suppose I might as well use my Google account, anything for an easy life and it’s not like I’m uploading anything too embarrassing. OK, here we go: “We did not recognise the audio format for this file, but we will try to process it anyway. See this article on recommended formats for more information”, well, I didn’t include any audio, so fair enough I suppose (a soundtrack would be nice though, suggestions on a postcard).

Anyway, here it is, in all its glory: I give you “invert264”:

Perhaps I should come up with a catchier name, but that will do for now.


Floating point numbers are made of jelly.

Floating point equality can be quite interesting, leading some people to think that they are actually made of some sort of jelly, and don’t in fact have precise values. This isn’t true of course but sometimes things can get quite bizarre. I was surprised to find this assertion:

printf("%.20g %.20g\n", f(), g());
assert(f() == g());

failing with this output:

$ ./foo
0.9979466119096509491 0.9979466119096509491
foo: foo.cpp:15: int main(): Assertion `f() == g()' fails
Aborted

(All code compiled with gcc 4.4.3, on Ubuntu 10.04, 32 bit Celeron).

Now 20 significant figures are enough to tell the difference between two doubles, so this looks a little strange.

More investigation showed that:

double x = f();
assert(x == f());

failed as well and I started to get worried. It’s true that floating point numbers aren’t equal to themselves if they are NaNs, but that didn’t seem to be the case here. Also, the assertion didn’t fail for code that had been optimized.

Some years ago I did some work on code generation for the 387 FPU and then we had trouble with rounding – test programs for the 387 would produce slightly different results than for other architectures like MIPS and Sparc. IEEE754 is pretty precise about how rounding should be done so this seemed a bit strange until we realized that the 387 uses 80-bit precision internally, with rounding to 64 bit doubles only happening when transferring data from the FPU to memory (FISTP and friends). There is a flag to set the internal rounding mode to double precision and once we had set that we got consistent results on all architectures.

Looks like something similar is happening here and a look at the disassembly confirms this:

	call	_Z1av
	fstpl	24(%esp)
	call	_Z1av
	fldl	24(%esp)
	fxch	%st(1)
	fucompp
	fnstsw	%ax

The first result of calling a() gets written out to memory as a double, hence is rounded, but then it’s reloaded but compared to the unrounded result from the second call.

Investigating further exposes more interesting behaviour:

#include <stdio.h>
#include <math.h>
#include <assert.h>
int main()
{
  double x,y,z;
  int a;
  x = 1.0;
  while (x != 0.0) {
    y = x; x /= 2;
  }
  a = y > 0;
  printf("%.40Lg\n", (long double)y);
  printf("%d\n", a);
  printf("%.20g\n", y/y);
  printf("%.40Lg\n", (long double)y);

  x = 1.0;
  y = nextafter(x,2.0);
  z = (x+y)/2.0;
  printf("%d\n", z > x && z < y);

  x = 0.5;
  while ((z = sqrt(x)) != x) x = z;
  y = x;
  while ((z = x*x) != x) x = z;
  printf("%.40Lg\n%.40Lg\n", (long double)x, (long double)y);
}

Now we have completely different behaviour between optimized and unoptimized code:

$ g++ -Wall fp.cpp
$ ./a.out
4.940656458412465441765687928682213723651e-324
1
1
4.940656458412465441765687928682213723651e-324
0
1
1

Unoptimized, pretty much as expected, we iterate down to the smallest representable float, which we divide by itself and get 1. If we average one and the next number, we don’t get a number in between; and finally, if we iterate square roots, we eventually arrive at 1, and squaring that, just gives 1 again.

$ g++ -O3 -Wall fp.cpp
$ ./a.out
3.645199531882474602528405933619419816399e-4951
0
-nan
0
1
0
0.9999999999999999999457898913757247782996

Bah, we seem to have iterated right down to the smallest extended value there, but when we divide it by itself, we get a NaN – it seems to have been magically converted to zero, as confirmed by printing it out again (presumably the 387 register has been stored temporarily in memory, while we did the printf, and truncated to 64 bits on the way. Now, we are able to find a value between 1 and 1 + ε, and finally, our repeated square root function seems to converge on something slightly less that 1 and when we go back the other way, we end up at 0 instead.

Fortunately, we don’t need to worry about what to do about this or whether it’s the chip designers or the language designers or the FP standard designers or the compiler writers who are getting it wrong (or maybe it really is that FP numbers are made of jelly). Newer x86 processors have SSE registers for FP, which are 64 bit doubles throughout (as do most other common architectures), so these problems are less likely to occur:

    ...
    call    _Z3av
    movsd    %xmm0, -8(%rbp)
    call    _Z3av
    ucomisd    -8(%rbp), %xmm0
    jne    .L8
    ...

and if we run our little test program (this time with gcc 4.7.2) on a 64-bit processor we get:

$ g++ -Wall fp.cpp
$ ./a.out
4.940656458412465441765687928682213723651e-324
1
1
4.940656458412465441765687928682213723651e-324
0
0
0.9999999999999998889776975374843459576368

for all levels of optimization.

We can force use of the 387 instructions with -mfpmath=387 and then we get:

$ g++ -O3 -mfpmath=387 -Wall fp.cpp
$ ./a.out
3.645199531882474602528405933619419816399e-4951
0
-nan
0
1
0
0.9999999999999999999457898913757247782996

again, so it doesn’t look like our problems are related to the particular gcc version.

With the new code, we lose the benefits of extra internal precision, but on the whole I prefer having half a chance of understanding what’s going on.


0 1 1 2 3 5 8 13…

For some reason I spent some time over Christmas looking at algorithms for computing Fibonacci numbers, so here’s the first post for 2013.

The Binet formulas for the Fibonacci and Lucas numbers are:

φ = (1+√5)/2
ψ = (1-√5)/2
φψ = -1
Fn = (φnn)/√5
Ln = φnn

We can now derive:

5Fn2 = φ2n2n-2(-1)n = L2n-2(-1)n
Ln2 = φ2n2n+2(-1)n = L2n+2(-1)n
LnFn = (φ2n2n)/5 = F2n

We can also use the formulas to prove:

n = Ln + Fn√5

which gives us:

Ln+m + Fn+m√5 = 2φn+m = 2φnφm = (LnLm+5FnFm)/2 + (LnFm+FnLm)√/2

So, equating the rational and irrational parts (as we may – if a+bk = c+dk with a,b,c,d rational, then k is rational unless a=c and b=d):

Ln+m = (LnLm+5FnFm)/2
Fn+m = (LnFm+FnLm)/2

Two special cases for this:

L2n = (Ln2+5Fn2)/2
F2n = LnFn

Ln+1 = (Ln+5Fn)/2
Fn+1 = (Ln+Fn)/2

With these recurrences, we can write some code:

def fib1aux(n):
    """Return F(n),L(n)"""
    if n == 0: return 0,2
    else:
        a,b = fib1aux(n//2)
        a,b = a*b,(5*a*a+b*b)//2
        if n%2 == 0: return a,b
        else: return (a+b)//2, (5*a+b)//2
    return a,b

def fib1(n): return fib1aux(n)[0]

We can save a bignum multiplication in each recursive step by replacing a,b = a*b,(5*a*a+b*b)//2 with something like ab = a*b; t = (a+b)*(5*a+b); a,b = ab,t//2-3*ab (suggestion due to Robin Houston at http://bosker.wordpress.com/2011/07/27/computing-fibonacci-numbers-using-binet%E2%80%99s-formula/).

Alternatively, we can follow the suggestion of Knuth in TAOCP (solution to exercise 4.6.3.26) and define:

def fib2aux(n):
    """Return F(n),L(n)"""
    if n == 0: return 0,2
    else:
        a,b = fib2aux(n//2)
        a,b = a*b,n//2%2*4-2+b*b
        if n%2 == 0: return a,b
        else: return (a+b)//2, (5*a+b)//2

def fib2(n): return fib2aux(n)[0]

In both cases, we benefit from avoiding the final calculation of Ln and define something like:

def fib2(n):
    if (n%2 == 0):
        a,b = fib2aux(n//2)
        return a*b
    else:
        a,b = fib2aux(n-1)
        return (a+b)//2

or, for a non-recursive solution (stealing some ideas from Robin Houston again):

def bits(n):
    r = []
    while n!=0:
        r.append(n%2)
        n >>=1
    r.reverse()
    return r

def fib3(n):
    a,b = 0,2
    last = 0
    for i in bits(n//2):
        a,b = a*b,last*4-2+b*b
        last = i
        if i: a,b = (a+b)//2, (5*a+b)//2
    if n%2 == 0:
        return a*b
    else:
        a,b = a*b,last*4-2+b*b
        return (a+b)//2

Of course, we don’t have to use Lucas numbers, a pleasant recurrence for Fibonacci numbers alone is:

Fn+m = Fn+1Fm+FnFm-1

which gives us:

F2n = FnFn+2FnFn-1
F2n-1 = FnFn+Fn-1Fn-1

and

F2n+1 = Fn+1Fn+1+FnFn
F2n = 2FnFn+1-FnFn

leading to:

def fibaux4(n):
    if n == 0: return 0,1
    elif n == 1: return 1,0
    else:
        a,b = fibaux4((n+1)//2)
        if (n%2 == 0): return a*(a+2*b),a*a+b*b
        else: return a*a+b*b,b*(2*a-b)

def fib4(n): return fibaux4(n)[0]

with an elegant symmetry between the odd and even cases. This seems to be solution suggested by Dijkstra in http://www.cs.utexas.edu/~EWD/ewd06xx/EWD654.PDF‎.

There is, of course more: notably the matrix based solutions – we observe that (an+2,an+1) is M(an+1,an) for 2×2 matrix M = (1 1 1 0). We then have (Fn+1,Fn) = Mn(1 0) and we can use conventional matrix operations, including inversion, to compute Fn. We can apply this to any linear recurrence, eg. an+2 = Aan+1+Ban corresponds to the matrix (A B 1 0).

It’s notable that M only depends on the recurrence and not on the initial values, so we can compute, for example, the Lucas numbers with (Ln+1,Ln) = Mn(1 2), and there is a regularity to the structure of M that can be used to optimise the computation, eg. for Fibonacci, M is of the form (a+b a a b) (cf. HAKMEM: http://www.inwap.com/pdp10/hbaker/hakmem/recurrence.html). But that probably needs another post.


Graphics Bugs

One of the nice things about graphics programming is that even the bugs can be fun, here’s a picture I made earlier, it wasn’t quite what I’d intended, and I don’t think it’s going to be in the Tate Modern any time soon, but I rather like it:

scribble


Complex Fibonacci

There are a couple of ways to extend the Fibonacci series to a continuous function on real or complex numbers. Here’s a nice one: the grid shows the mapping of the unit strip, 0 <= re(z), 0 <= im(z) <= 1, grid points are 0.1 apart. The real axis is picked out in red, notice that it crosses itself exactly at the points of the Fibonacci series – twice at 1 after executing a loop-the-loop (clicking on the image should give you a larger version).

fibonacci

The function is just the standard Binet formula, interpreted over the complex plane:
z - -φ-z)/√5
or equivalently:
z - eiπzφ-z)/√5

For negative values, the function turns into a nice spiral (the exp component dominates). Here is the 0 <= im(z) <= 0.1 strip:

spiral2
The red parallelogram illustrates F(z+2) = F(z+1)+F(z) still holds, even for our complex quantities.

Most of the action takes place around the origin, so let’s zoom in:

spiral1

The image of the real axis passing through the points -1,1,0,1,1,2,… is clearly visible (the real axis is on the outside, the origin is the leftmost point on the red parallelogram).

Sometimes the series is extended as z - cos(πz)φ-z)/√5 which produces a mapping that takes the real line to itself, ie. taking the real part of the exponential, which is also nice, here we see the mapping of the strip 0.1 <= im(z) <= 0.2:

cosspiral

Again, the red parallelogram illustrates the recurrence.