Testing for Triangularity

A company I used to work for had a standard interview problem of writing a short function that would take 3 numbers (usually assumed to be integers) and determine if they formed a valid triangle. There was much discussion about the best/fastest/clearest/most elegant way of doing this, but I don’t remember the topic of what to do if the numbers were floating point coming up.

Curiously, being interviewed myself at another company not long ago I was asked this very question and wasn’t quite sure what the best thing to do was – for integers, my preferred solution is to sort the 3 numbers as a >= b >= c, then we have a triangle just when c > 0 and a < b+c (or
a-b < c if we are worried about overflow). What about if they are floating point numbers? Clearly if b >> c, then b+c might lose precision, so, for example, we might have c > 0 and b = a, so it’s valid, but if b+c = b, then our test fails.

We could try the same trick as for integers, testing for a-b < c – subtracting the two largest should minimize the rounding error; this seems plausible, but can we be sure we won’t get it wrong?

An earlier post looked at the sequence of integers:

0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 18 20 22 24 26 28 30
32 36 40 44 48 52 56 60
64 72 80 88 96 104 112 120
...

which are really just thinly disguised (non-negative) floating point numbers. An important feature, obvious here and easy to prove of “real” FP numbers, is that, apart from the first two rows, each row is the double of the row above it, therefore, 2n is representable exactly iff n is. Furthermore, it’s pretty obvious that if a and b are representable, with a/2 <= b <= a, then a-b is representable (and if a and b are numbers in the first two rows, then all differences are representable, so we don’t need to worry about them).

Now let’s return to our triangle: we have a >= b >= c > 0 and clearly a,b,c must be representable. If all the numbers are in the first two rows, then all operations are exact and we get the right answer. Otherwise, a is even: first, consider the case b >= a/2, as above, this implies that a-b is representable and the comparison with c must be exact, so our test is correct. Otherwise, if b < a/2, then since c <= b, c + b <= 2b < a, so we don’t have a triangle, and, assuming that rounding is monotonic (ie. x <= y => round(x) <= round(y), or equivalently, x < y => round(x) <= round(y)), our check works out correct too:

b < a/2, so b+c < a (exact) so c < a-b (exact) so c = round(c) <= round(a-b)

QED

As usual for this blog, we are merely paddling at the shore of a much deeper sea. For much more on triangles, floating point approximations etc, see Kahan’s paper:

Click to access Triangle.pdf

(or indeed anything else by the great man:

http://www.cs.berkeley.edu/~wkahan/

http://owpdb.mfo.de/detail?photo_id=5350).


Subbag Enumeration

A bag or multiset is just a map from some base set X to the natural numbers, so we can represent such a bag by a sequence of (non-negative) integers. Subbags are defined in the obvious way (and if s is subbag of t, t is a superbag of s). Given a subbag s of m, we can find the next subbag (in reverse lexicographic order) with this function:

bool nextbag(vector<int> &s, 
             const vector<int> &m,
             size_t start = 0)
{
  for (size_t i = start; i < s.size(); i++) {
    if (s[i] == m[i]) {
      s[i] = 0;
    } else {
      s[i]++;
      return true;
    }
  }
  return false;
}

This sets the subsequence [start..] of s to the (reverse) lexicographically next sequence. m gives maximum values for each field of s and we return false if we have cycled back round to the zero sequence. Note that if m[i] < 0 then effectively there is no maximum value for each element.

We might also want to skip from s to the next subbag that is not a superbag of s. This is accomplished by:

bool nextnonsuper(vector<int> &s, const vector<int> &m)
{
  for (size_t i = 0; i < s.size(); i++) {
    if (s[i] != 0) {
      s[i] = 0;
      return nextbag(s,m,i+1);
    }
  }
  return false;
}

Here we see why we wanted the start parameter in nextbag. For example, we go from:

[0,0,0,3,4,...] => [0,0,0,0,5,...]

Given f: bag -> A where A is an ordered set and where f is monotonic (if s is a subbag of t, then f(s) <= f(t)), if we want to find all (sub)bags where f(s) <= x, we can use nextnonsuper to skip some of the enumeration:

while (true) {
  if (f(s) >= t) s = nextnonsuper(s)
  else s = nextbag(s)
  ...
}

For an example, we can use a weighted sum over the bag elements (we will assume that the weights are positive).

int weights(const vector<int> &s, const vector<int> &w)
{
  int total = 0;
  for (size_t i = 0; i < s.size(); i++) {
    total += s[i]*w[i];
  }
  return total;
}

and we can add some boiler plate to make the whole thing into a nice standalone application:

// Print out our vector, include weights if required
// Optionally print in reverse order
void print(const vector<int> &s, 
           const vector<int> *w = NULL, 
           bool printreverse = false)
{
  int sz = s.size();
  for (int i = 0; i < sz; i++) {
    int index = printreverse?sz-i-1:i;
    cout << s[index];
    if (w != NULL) cout << ":" << (*w)[index];
    cout << (i < sz-1?" ":"\n");
  }
}

int main(int argc, char *argv[])
{
  int target;    // The sum we are aiming for
  vector<int> s; // Where we build the sequence
  vector<int> m; // The number of each items available
  vector<int> w; // The weight of each item
  bool printreverse = false; // Print sequences in reverse
  bool printweight  = false; // Print the weights as well
  bool printless    = false; // Print sequences less than target
  bool printgreater = false; // Print those greater than target

  // Crude but effective argument processing
  while (argv[1][0] == '-') {
    if (strcmp(argv[1],"-r") == 0) {
      printreverse = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-l") == 0) {
      printless = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-g") == 0) {
      printgreater = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-w") == 0) {
      printweight = true;
      argv++; argc--;
    } else {
      cerr << "Invalid flag: " << argv[1] << endl;
      exit(0);
    }
  }
  // Read target
  // target = 0 means no target
  if (sscanf(argv[1],"%d",&target) != 1) {
    cerr << "Invalid target: " << argv[1] << endl;
    exit(0);
  }
  // Read count/weight pairs
  for (int i = 2; i < argc; i++) {
    int a,b;
    if (sscanf(argv[i],"%d:%d",&a,&b) != 2) {
      cerr << "Invalid arg: " << argv[i] << endl;
      exit(0);
    }
    m.push_back(a);
    w.push_back(b);
  }
  // Initialize result sequence.
  s.resize(m.size());
  while(true) {
    int total = weights(s,w);
    // Print result if desired
    if (target == 0 || 
        total == target || 
        (total < target && printless) ||
        (total > target && printgreater)) {
      print(s, printweight?&w:NULL, printreverse);
    }
    if (target > 0 && total >= target) {
      // Have hit or exceeded target, so skip superbags
      if (!nextnonsuper(s,m)) break;
    } else {
      if (!nextbag(s,m)) break;
    }
  }
}

Compiling all this as bagenum.cpp, we can now do some enumerations. All the ways to make 5 from the bag {1,1,1,2,2,3}:

$ ./bagenum 5 3:1 2:2 1:3
3 1 0
1 2 0
2 0 1
0 1 1

We can specify a target of 0 to get all the subbags of the input, and the -w flag prints the weight out as well:

$ ./bagenum -w 0 3:1 2:2
0:1 0:2
1:1 0:2
2:1 0:2
3:1 0:2
0:1 1:2
1:1 1:2
2:1 1:2
3:1 1:2
0:1 2:2
1:1 2:2
2:1 2:2
3:1 2:2

We can enumerate bitstrings with a given number of bits set

$ ./bagenum 2 1:1 1:1 1:1 1:1
1 1 0 0
1 0 1 0
0 1 1 0
1 0 0 1
0 1 0 1
0 0 1 1

Or with at most a given number set:

$ ./bagenum -l 2 1:1 1:1 1:1 1:1
0 0 0 0
1 0 0 0
0 1 0 0
1 1 0 0
0 0 1 0
1 0 1 0
0 1 1 0
0 0 0 1
1 0 0 1
0 1 0 1
0 0 1 1

Or the ways of rolling 4 with 3 dice numbered 0-5 – much the same as rolling 7 with 3 normal dice (the -r parameter prints the results in reverse order, ie. normal lexicographic order):

$ ./bagenum -r 4 5:1 5:1 5:1
0 0 4
0 1 3
0 2 2
0 3 1
0 4 0
1 0 3
1 1 2
1 2 1
1 3 0
2 0 2
2 1 1
2 2 0
3 0 1
3 1 0
4 0 0

All this works quite nicely with unlimited multiplicities, provided we also set an upper bound, so we can enumerate the solutions to the classic Polya problem of the ways to make 50c with usual US coins (50 as it happens):

$ ./bagenum -w 50 -1:1 -1:5 -1:10 -1:25 -1:50
50:1 0:5 0:10 0:25 0:50
45:1 1:5 0:10 0:25 0:50
40:1 2:5 0:10 0:25 0:50
35:1 3:5 0:10 0:25 0:50
30:1 4:5 0:10 0:25 0:50
25:1 5:5 0:10 0:25 0:50
20:1 6:5 0:10 0:25 0:50
15:1 7:5 0:10 0:25 0:50
10:1 8:5 0:10 0:25 0:50
5:1 9:5 0:10 0:25 0:50
0:1 10:5 0:10 0:25 0:50
40:1 0:5 1:10 0:25 0:50
35:1 1:5 1:10 0:25 0:50
...

Finally, the -g flag prints the subbags found that are greater than the target – but only the ones all of whose subbags are less than the target (if you see what I mean):

./bagenum -g -w 8 1:2 1:3 1:4 1:5
1:2 1:3 1:4 0:5
0:2 1:3 0:4 1:5
0:2 0:3 1:4 1:5

(So the complement of subbags are the ones whose sum is less than or equal to a target but every superbag is greater, as in the classic knapsack problem).

Once again, this works fine with unlimited multiplicities:

$ ./bagenum -g -w 8 -1:2 -1:3 -1:4 -1:5
4:2 0:3 0:4 0:5
3:2 1:3 0:4 0:5
1:2 2:3 0:4 0:5
0:2 3:3 0:4 0:5
2:2 0:3 1:4 0:5
1:2 1:3 1:4 0:5
0:2 2:3 1:4 0:5
0:2 0:3 2:4 0:5
2:2 0:3 0:4 1:5
0:2 1:3 0:4 1:5
0:2 0:3 1:4 1:5
0:2 0:3 0:4 2:5

All in all, a nice little combinatorial Swiss Army knife and useful when trying to understand the stuff in:

for example (I like the TAOCP fascicles, they are much more convenient to read in the bar, in the bath or on the bus). This one has lots more about this sort of problem: special cases with more efficient solutions, as well as more general algorithms for more general combinatorial problems.


Tail recursive string recognizers

A recent discussion at work about the best way to write a simple string checking function set me thinking…

Essentially, the problem is to recognize strings conforming to some regular expression, but with some extra conditions, that either can’t be expressed as a regexp or that are inconvenient to do so, so we can’t just plug it in to PCRE or whatever RE library is flavour of the month.

In this case, we wanted strings conforming to:

[0-9a-zA-Z*][-0-9a-zA-Z*]*(.[0-9a-zA-Z*][-0-9a-zA-Z*]*)*

ie. a path containing “.” separated components, possibly with “*” wildcards, but also with the condition that each component has at most one wildcard and with a limit on the length of each component.

Ignoring the extra conditions, we can define a suitable regular grammar:

S = S0
S0 = ALNUM + S1 | "*" + S1
S1 = ALNUM + S1 | "-" + S1 | "*" + S1 | "." + S0 | ""

and this naturally suggests a recursive descent recognizer, though in this case, there is no actual descent as the grammar is regular and all the recursions are tail calls.

Happily, these days compilers like GCC will compile tail calls to jumps (at least on higher optimization settings), just like ML compilers, for example, have always had to, and we can hope for reasonably good code from such an approach. Of course, once the tail calls have been optimized, we have a finite state recognizer with the state transitions implemented as gotos!

The extra conditions can now be easily added in to the state transition functions, with any variables needed just appearing as extra parameters (and because we need to specify the parameter value for every recursive call/state transition, we are less likely to forget to set something) so we end up with something like:

#include <ctype.h>

static inline bool check(const char *p);

static bool check1(const char *p, int n, bool seen) {
  if (n > 4) return false;
  if (isalnum(*p)) return check1(p+1, n+1, seen);
  if (*p == '-') return check1(p+1, n+1, seen);
  if (*p == '*') return !seen && check1(p+1,n+1,true);
  if (*p == '.') return check(p+1);
  if (*p == '\0') return true;
  return false;
}

static bool check(const char *p) {
  if (isalnum(*p)) return check1(p+1,1,false);
  if (*p == '*') return check1(p+1,1,true);
  return false;
}

Compiling with 64 bit gcc 4.7.2, -O2 (gcc only optimizes tail calls at -O2 and above), we get:

_Z5checkPKc:
    movq    %rbp, -8(%rsp)
    movq    %rbx, -16(%rsp)
    subq    $24, %rsp
    movzbl  (%rdi), %ebx
    movq    %rdi, %rbp
    call    __ctype_b_loc
    movq    (%rax), %rax
    movsbq  %bl, %rdx
    testb   $8, (%rax,%rdx,2)
    jne .L7
    cmpb    $42, %bl
    je  .L8
    xorl    %eax, %eax
    movq    8(%rsp), %rbx
    movq    16(%rsp), %rbp
    addq    $24, %rsp
    ret
.L7:
    leaq    1(%rbp), %rdi
    xorl    %edx, %edx
.L5:
    movq    8(%rsp), %rbx
    movq    16(%rsp), %rbp
    movl    $1, %esi
    addq    $24, %rsp
    jmp _ZL6check1PKcib
.L8:
    leaq    1(%rbp), %rdi
    movl    $1, %edx
    jmp .L5
_ZL6check1PKcib:
    movq    %rbp, -24(%rsp)
    movq    %r12, -16(%rsp)
    movq    %rdi, %rbp
    movq    %r13, -8(%rsp)
    movq    %rbx, -32(%rsp)
    subq    $40, %rsp
    movzbl  (%rdi), %ebx
    movl    %esi, %r13d
    movl    %edx, %r12d
    call    __ctype_b_loc
    movq    (%rax), %rax
    movsbq  %bl, %rcx
    testb   $8, (%rax,%rcx,2)
    jne .L23
    cmpb    $45, %bl
    je  .L23
    cmpb    $42, %bl
    je  .L24
    testb   %bl, %bl
    sete    %al
    cmpb    $46, %bl
    je  .L25
.L12:
    movq    8(%rsp), %rbx
    movq    16(%rsp), %rbp
    movq    24(%rsp), %r12
    movq    32(%rsp), %r13
    addq    $40, %rsp
    ret
.L23:
    leal    1(%r13), %esi
    cmpl    $4, %esi
    jg  .L14
    leaq    1(%rbp), %rdi
    movzbl  %r12b, %edx
.L21:
    movq    8(%rsp), %rbx
    movq    16(%rsp), %rbp
    movq    24(%rsp), %r12
    movq    32(%rsp), %r13
    addq    $40, %rsp
    jmp _ZL6check1PKcib
.L24:
    xorl    %eax, %eax
    testb   %r12b, %r12b
    jne .L12
    leal    1(%r13), %esi
    cmpl    $4, %esi
    jg  .L12
    leaq    1(%rbp), %rdi
    movl    $1, %edx
    jmp .L21
.L14:
    xorl    %eax, %eax
    jmp .L12
.L25:
    leaq    1(%rbp), %rdi
    movq    8(%rsp), %rbx
    movq    16(%rsp), %rbp
    movq    24(%rsp), %r12
    movq    32(%rsp), %r13
    addq    $40, %rsp
    jmp _Z5checkPKc

It looks like the tail calls have indeed been compiled as jumps, so the stack won’t blow up. There is what looks like some unnecessary movement of function arguments between the stack and the function argument registers, but we’re not doing too badly. Probably not quite as fast as a hand-coded goto-based version, but only a little slower, almost certainly a price worth paying for the gain in clarity (in fact, most of the time seems to be spent in the calls to isalnum).


Finally, a new laptop

It was high time I got a new laptop, the old Dell Inspiron 2200 had done quite well over the years, but was starting to fall apart and increasingly was just not up to the demands placed on it. I spent a while mulling over the options (don’t need too much raw power, but want some reasonable features), and ended up going for an Acer v5-171, this is the entry level model with the Core i3 processor, some sort of Ivy Bridge, whatever these silly names mean, more than adequate for my needs – doesn’t have the AES encryption I’d get with an i5, but I can probably live without that, and it doesn’t have Turbo Boost, but I don’t even know what that is. What it does have is a 0.5TB disk, 6GB of memory, and is clocked at 1.9GHz. There are 3 USB ports, one is USB 3.0, and a dinky little card reader on the front that I didn’t notice for a while. The keyboard is small, but quite usable, in fact it’s only a little narrower than the keyboard on my old Inspiron. It’s got those modern, flat keys that I didn’t think I liked, but actually, it’s quite pleasant to type on.

Here’s what Intel say about the CPU:

http://ark.intel.com/products/72057/Intel-Core-i3-3227U-Processor-3M-Cache-up-to-1_90-GHz

And what PCWorld say about the laptop. I wasn’t swayed by the “As Advertised on TV”, I hardly ever watch it, and in fact, the spec is better than what is describe here:

http://www.pcworld.co.uk/gbuk/acer-aspire-v5-171-11-6-laptop-silver-19414897-pdt.html#longDesc

The main alternative was the Asus S200 with a touch screen, though I hate fingerprints, and with either a rather inferior processor or a rather superior price.

I was intending to have a dual booting system with the Windows 8 it came preinstalled with (I bought it for the handsome price of £329 from the local PC World, who tried to hit me for an extra £30 for their setup service, I explained that I really didn’t want or need that, or the extended warranty, or a copy of Office, just the computer please… The helpful assistant Maya was very pleasant about all this and worked out that she should tick the “No Marketing” box without having to ask) but it didn’t take long to give up on that idea; I can’t say my quick sojourn round Windows 8 filled me with much joy, and I’ve been happily using Linux (and Android) for everything for some time now, so after spending 10 mins or so failing to work out how to set up UEFI dual booting (or even how to boot off the Live USB stick in UEFI mode), I flipped over to BIOS Legacy Mode on and told Ubuntu (12.10) to wipe the disk with extreme prejudice. I used to like playing around with partitioners, mulling over how big my swap partitition should be, or if I should keep system files and user data separate, these days I just let the installer get on with it.

After that, pretty much everything just went through just as it should. A deja vu moment of having to add “acpi_backlight=vendor” to the boot options to get the brightness control to work (one day I’ll find out what it means), but apart from that, everything works just fine, straight out of the box.

The screen is a little smaller (physically) than I’m used to, everything these days seems to be 1366×768, but in this case it’s all within a 10.1″ display, so the detail is lovely and crisp, but some things are a bit small for my poor old eyes so some investigation of accessibility options is called for (I recommend the NoSquint Firefox extension). There is a pixel stuck at red (played with lcdnurse to no avail) and the touchpad is a bit sticky (seems better after fiddling with the sensitivity). The sound is pretty poor (though we now seem to have a Spinal Tap-style volume control that goes way beyond 100%, which helps a little). Haven’t tried headphones yet.

Assuming the Power Statistics tool can be trusted, it uses 4W with screen off, 6W with minimum visible brightness level, going up to 8W for a normal level and nearly 10 on maximum. Heavy graphics & CPU use pushes things up to 16 or 17W. Wifi doesn’t seem to make much difference. The battery is reckoned to be 36Wh, so that should be a little over 4 hours, not much by modern standards, but enough for my modest needs. Not too much of a heat problem, the base of the laptop is a little warm but nothing too uncomfortable.

Out of curiosity, I investigated UEFI a bit more: to fiddle with any of the secure boot options in the BIOS you have to set the “Supervisor Password” it seems (I suppose that should have been obvious – initially, it’s the only modifiable field on the secure boot screen). Having done this, in the same BIOS screen, one can select the .efi files from the Ubuntu installation USB stick to be bootable from (/EFI/BOOT/ seems to be required directory) and then, mirabile dictu, we can boot into Live USB installation. I tried “Boot Repair” but got a warning about not having an EFI partition and would I like to add one (presumably if I had done the original installation in UEFI mode, this would have been created for me, I’m not sure I can really be bothered right now though).

Resetting to Legacy Mode and rebooting, we get back to the original installation and all is still well (it always amazes me when things still work after fiddling around, from the time I pulled the CPU out of my Sinclair Spectrum and put it back, and it Still Worked – I had less appreciation then of how easy it is to break the pins on DIL ICs). Only thing not working is that stuck pixel (not visible with light colours though) and though Bluetooth looks like its working, it won’t detect my phone. I wonder if this is related to this in dmesg:

[ 0.952921] WARNING: at /build/buildd/linux-3.5.0/arch/x86/mm/ioremap.c:102 __ioremap_caller+0x335/0x380()
[ 0.952925] Hardware name: V5-171
[ 0.952928] Modules linked in:
[ 0.952932] Pid: 1, comm: swapper/0 Not tainted 3.5.0-17-generic #28-Ubuntu
[ 0.952935] Call Trace:
[ 0.952943] [] warn_slowpath_common+0x7f/0xc0
[ 0.952948] [] warn_slowpath_null+0x1a/0x20
[ 0.952952] [] __ioremap_caller+0x335/0x380
[ 0.952958] [] ? bgrt_init+0x47/0x126
[ 0.952964] [] ? acpi_get_table_with_size+0x5f/0xbe
[ 0.952969] [] ? acpi_hed_init+0x30/0x30
[ 0.952974] [] ioremap_nocache+0x17/0x20
[ 0.952978] [] bgrt_init+0x47/0x126
[ 0.952984] [] do_one_initcall+0x12a/0x180
[ 0.952990] [] kernel_init+0x140/0x1c9
[ 0.952995] [] ? loglevel+0x31/0x31
[ 0.953000] [] kernel_thread_helper+0x4/0x10
[ 0.953005] [] ? start_kernel+0x3c4/0x3c4
[ 0.953009] [] ? gs_change+0x13/0x13
[ 0.953015] ---[ end trace d285ec97245f6911 ]---
[ 0.953064] GHES: HEST is not enabled!

That last one sounds like one of those cryptic lovers notes that used to be in the classified ads of The Times.

I love boot messages:

[ 9.573227] cfg80211: Calling CRDA to update world regulatory domain
# I expect they were staying up late, waiting for that call.
[ 9.574528] pci 0000:00:00.0: >detected 131072K stolen memory
# I thought stealing RAM had gone out of fashion, now it's so cheap
[ 9.980692] init: failsafe main process (758) killed by TERM signal
# Doesn't sound very failsafe

Enough lame attempts at computer humour, back to the laptop. Being an old stick in the mud, this is my first time with Ubuntu 12.x with Unity and all that. I’m not convinced I want my laptop to look like a phone, but I’m prepared to go with it for the moment and give it my best shot, though it’s tempting to just find whatever the modern equivalent of TWM is and use that. There are some nice features for sure, it seems that now Thunderbird runs in the background because on rebooting I find an envelope icon that it turns out is telling me that Gaylord Madrid is now fully funded: go Gaylord, go (if you’ve got a few bob to spare, I recommend parking it with http://www.lendwithcare.org). And I’ve just discovered that the Windows key (which Linux seems to be calling Super – it would be nice to have a real Space Cadet keyboard of course) actually does something useful. The trackpad of the Acer is a bit of weak point, so making more use of keyboard shortcuts could be a good idea – as a long-time Emacs user, I really ought to get used to using Ctrl-T in Firefox to start up a new tab, for example.

So, pretty happy all in all, a nice bit of kit, some shortcomings, but we aren’t exactly high end here so that’s to be expected. Next time I’ll know what to do with UEFI, but going single boot from the start has got to be the right thing.

For the record:

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Stepping: 9
CPU MHz: 779.000
BogoMIPS: 3791.51
Virtualisation: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
stepping : 9
microcode : 0x15
cpu MHz : 779.000
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 3791.51
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

$ glxinfo | grep -i opengl
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Ivybridge Mobile
OpenGL version string: 3.0 Mesa 9.0.2
OpenGL shading language version string: 1.30

$ lspci
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
03:00.0 Network controller: Atheros Communications Inc. AR9462 Wireless Network Adapter (rev 01)
04:00.0 Ethernet controller: Broadcom Corporation NetLink BCM57785 Gigabit Ethernet PCIe (rev 10)
04:00.1 SD Host controller: Broadcom Corporation NetXtreme BCM57765 Memory Card Reader (rev 10)

$ lsusb
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 064e:e330 Suyin Corp.


Movies from OpenGL

I’ve got a few natty programs that draw stuff with OpenGL, I’d like to be able to record their output in some appropriate video format that I can show my friends, post on YouTube etc. Not quite sure how to do that, but it can’t be that difficult I’d have thought.

As as experiment, I’ll write it all up as I go along.

It seems that I can use the data in the frame buffer directly with glReadPixels:

http://www.opengl.org/sdk/docs/man/xhtml/glReadPixels.xml

so it should be easy to dump out frames as raw RGB values or whatever is required by the next step, ie. encoding to the actual movie format, AVI, mpeg4, whatever.

ffmpeg seems to be the tool for doing this. Let’s try some experiments: we can dump from X:

ffmpeg -f x11grab -s cif -r 25 -i :0.0+10,20 /tmp/out.mpg

but that has horrible encoding artefacts, trying with:

ffmpeg -qscale 5 -f x11grab -s cif -r 25 -i :0.0 /tmp/out.mpg

does rather better (qscale it seems goes from 1 to 31, lower is higher quality).

Apparently, ffmpeg is ‘deprecated’ and we should use avconf, but that seemed to fail dismally, it looked like it wasn’t synchronized with the frame buffer refreshes or something, I get all sorts of purple streaks and flashes, not nice.

avconv -f x11grab -s cif -r 25 -i :0.0 /tmp/out.mpg

maybe there’s an option for that too, but no idea what it might be.

Sticking to ffmpeg for the moment, after the usual Googling, this seems useful:

http://stackoverflow.com/questions/7238013/rawvideo-and-rgb32-values-passed-to-ffmpeg

Looks like something like:

ffmpeg -vcodec rawvideo -f rawvideo -pix_fmt rgb32 -s256x256 -i pipe:0 -f avi -vcodec foo.avi

might work.

I’ve got some raw video files around from my MMPS project: if you take a PPM type 6 file, and take off the header lines you end up with raw bytes, packed as 24 bit RGB values.

A bit of experimentation and I find all that is needed to make a rather boring video of the earth, contrary to Galileo, not moving, is:

(while true; do cat earth.raw; done) | ffmpeg -qscale 1 -f rawvideo -pix_fmt rgb24 -s 1000x500 -i pipe:0 foo7.avi

ffmpeg seems to work out that we need an avi wrapper for an avi file, so we don’t need -f avi and it uses its own mpeg4 encoder by default, so we don’t need -vcodec mpeg4, and the -f rawvideo covers -vcodec rawvideo. And another thing that I now know, but not knowing held me up a little: making an AVI file like this but with just one frame doesn’t work very well (at least Linux Movie Player won’t play the result).

Back to OpenGL, now I’ve got the video coding back end sorted I need to generate some real data. Since I’ll be piping the data into ffmpeg, I just need to dump out the frames to stdout, in the appropriate format. The OpenGL doc for the various formats is, as usual fairly baffling for a newbie like myself but this seems to work:

glFlush();
static size_t nbytes = 3*screen_width*screen_height;
static char* framedata = new char[nbytes];
glReadPixels(0,0,screen_width,screen_height,GL_RGB,GL_UNSIGNED_BYTE,framedata);
fwrite(framedata,1,nbytes,stdout);
glutSwapBuffers();

I’m not going to put this forward as model code: static variables, yuk, and there’s no protection against some damn fool resizing the window while we are ‘recording’, which won’t make anything very happy, and I’m not even checking the return value of fwrite or for errors in glReadPixels. We could do things entirely offline, but we might want to record some interactions and writing out each frame doesn’t seem to slow things down too much.

./poly2 1 1 1 1 | ffmpeg -qscale 1 -vcodec rawvideo -f rawvideo -pix_fmt rgb24 -s 800x640 -i pipe:0 poly.avi

I probably ought to change poly2 so it takes width and height on the command line rather than hardwired (but recompilation is so fast these days).

Next step, uploading to YouTube. Looking at:

http://support.google.com/youtube/bin/answer.py?hl=en&answer=1722171&topic=2888648&ctx=topic

it seems that I probably want 480p format, 854×480 as a sort of baseline. I still don’t understand about codecs, but looks like H.264 is the way to go. Now, what encoder should I use, looks like x264 or libx264 is the thing, but nothing is installed:

$ ffmpeg -codecs | grep 264
D V D h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
D V D h264_vdpau H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (VDPAU acceleration)

It seems I can decode, but not encode.

The rather helpful:

https://help.ubuntu.com/community/FFmpeg

suggests installing these extra packages:

sudo apt-get install libavcodec-extra-53 libavdevice-extra-53 libavfilter-extra-2 libavformat-extra-53 libavutil-extra-51 libpostproc-extra-52 libswscale-extra-2

and now I have:

$ ffmpeg -codecs | grep 264
D V D h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
D V D h264_vdpau H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (VDPAU acceleration)
EV libx264 libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10

Nice, and now:

./poly2 1 1 1 1 | ffmpeg -qscale 1 -vcodec rawvideo -f rawvideo -pix_fmt rgb24 -s 800x640 -i pipe:0 -vcodec libx264 foo3.avi

seems to do the trick.

Looking good, let’s try the YouTube recommended 854×480 dimensions; hmm, we’ve gone monochrome with rather prominent scan lines (reminded me of the old Dr Who tent at Glastonbury, hands up if anyone remembers that):

Screenshot from 2013-03-10 16:49:55

Looks like a data alignment problem – I’ve still got magic numbers in the code, but they seem to be the right ones – but checking OpenGL documentation, I find:

http://www.opengl.org/sdk/docs/man/xhtml/glPixelStore.xml

Looks like OpenGL is word-aligning the pixel rows, fine for 800 pixels, not so good for 854.

glPixelStorei(GL_PACK_ALIGNMENT,1)

should sort this out (note the ‘i’ for ‘integer’ on the end there), and indeed it does:

./poly2 1 1 1 1 | ffmpeg -f rawvideo -pix_fmt rgb24 -s 854x480 -i pipe:0 -vcodec libx264 foo264.avi

makes me a nice 20MB, 1 minute 17 sec video of some coloured balls writhing around in empty space. A nice background (and one that disguises the stuck pixel on my new laptop that I only notice watching this sort of thing) would be good.

Now for that YouTube account, I suppose I might as well use my Google account, anything for an easy life and it’s not like I’m uploading anything too embarrassing. OK, here we go: “We did not recognise the audio format for this file, but we will try to process it anyway. See this article on recommended formats for more information”, well, I didn’t include any audio, so fair enough I suppose (a soundtrack would be nice though, suggestions on a postcard).

Anyway, here it is, in all its glory: I give you “invert264”:

Perhaps I should come up with a catchier name, but that will do for now.