Templated bit twiddling

Sometimes it’s nice to combine C++ template metaprogramming with some bit twiddling, here’s a classic parallelized bit counting algorithm with all the magic numbers constructed as part of template instantiation:

#include <stdio.h>
#include <stdlib.h>

template <typename T, int k, T m>
struct A
{
  static T f(T n)
  {
    static const int k1 = k/2;
    n = A<T, k1, m^(m<<k1)>::f(n);
    return ((n>>k)&m) + (n&m);
  }
};

template <typename T, T m>
struct A<T, 0, m>
{
  static T f(T n) { return n; }
};

template<typename T>
static inline int bitcount (T n)
{
  static const int k = 4 * sizeof(n);
  return A<T, k, (T(1)<<k)-1>::f(n);
}

int main(int argc, char *argv[])
{
  unsigned long n = strtoul(argv[1], NULL, 10);
  printf("bitcount(%lu) = %d\n", n, bitcount(n));
}

We could extract the magic number construction out as a separate template, and there are some extra optimizations that this code doesn’t do, for example, the level 1 code can be simplified to:

i = i - ((i >> 1) & 0x55555555);

rather than what we get from the above:

i = ((i >> 1)&0x55555555) + (i&0x55555555);

Adding an appropriate partial template instantiation for A<T,1,m> is left as an exercise.

In fact, there are lots of shortcuts that can be made here, we aren’t really being competitive with Bit Twiddling Hacks:

unsigned int v; // count bits set in this (32-bit value)
unsigned int c; // store the total here
static const int S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
static const int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};

c = v - ((v >> 1) & B[0]);
c = ((c >> S[1]) & B[1]) + (c & B[1]);
c = ((c >> S[2]) + c) & B[2];
c = ((c >> S[3]) + c) & B[3];
c = ((c >> S[4]) + c) & B[4];

which would need 3 template specializations and we are starting to get a bit complicated…

In the same style then, here’s a bit reverse function, with a separate magic number calculating template:

#include <stdio.h>
#include <stdlib.h>

template <typename T, int k>
struct Magic {
  static const T m = ~T(0)/(T(1)+(T(1)<<k));
};

template <typename T, int k>
struct RevAux {
  static inline T f(T n) {
    T n1 = RevAux<T,k/2>::f(n);
    return ((n1>>k) & Magic<T,k>::m) | ((n1 & Magic<T,k>::m)<<k);
  }
};

template <typename T>
struct RevAux<T,0> {
  static inline T f(T n) { return n; }
};

template <typename T>
T rev(T n) { return RevAux<T,4*sizeof(T)>::f(n); }

int main(int argc, char *argv[])
{
  unsigned long n = strtoul(argv[1],NULL,16);
  printf("%lx %lx\n", n, rev(n));
}

Now, how do we check that the template parameter T is unsigned (otherwise the right shifts are probably going to go wrong)…

A simple optimization is to replace the top level function with:

template <typename T>
static inline T rev(T n) { 
  static const int k = 4*sizeof(T);
  n = RevAux<T,k/2>::f(n); 
  return (n>>k)|(n<<k);
}

but GCC seems to have worked that out for itself (compiling gcc 4.4.3 , -O3 -S):

.globl _Z7reversej
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        popl    %ebp
        movl    %eax, %edx
        andl    $1431655765, %edx
        shrl    %eax
        addl    %edx, %edx
        andl    $1431655765, %eax
        orl     %eax, %edx
        movl    %edx, %eax
        andl    $858993459, %eax
        shrl    $2, %edx
        andl    $858993459, %edx
        sall    $2, %eax
        orl     %edx, %eax
        movl    %eax, %edx
        andl    $252645135, %edx
        shrl    $4, %eax
        andl    $252645135, %eax
        sall    $4, %edx
        orl     %eax, %edx
        movl    %edx, %eax
        andl    $16711935, %eax
        shrl    $8, %edx
        sall    $8, %eax
        andl    $16711935, %edx
        orl     %edx, %eax
        rorl    $16, %eax
        ret

This sort of thing isn’t going to work very well if the compiler isn’t up to scratch, but GCC seems to have done it right here.

There are slightly faster ways to reverse bits using ternary swap (swap top and bottom thirds, recurse), but they need to be more ad hoc unless you have a ternary computer with word length a power of 3, so less amenable to this sort of generic treatment (or at least, it becomes a lot harder).

Advertisements


Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s