tolower
Posted: February 12, 2014 Filed under: Bit twiddling, C Leave a commentIn an Age of Highly Pipelined RISC Architectures, we might be tempted to write a basic ASCII lower-case function as:
int tolower1(int c) { return c|-(c-0101U<032)&040; }
and avoid any branching. We subtract ‘A’ from c and do an unsigned comparison with ‘Z’-‘A’, convert to a bitmask and use that to ‘or’ in the lower case bit where necessary.
A more quiche-eating approach would be the clear and obvious:
int tolower2(int c) { if (c >='A' && c <= 'Z') return c+'a'-'A'; else return c; }
at which hard-nosed hacker types will sneer, however, an Age of Highly Pipelined RISC Architectures is also an Age of Conditional Move Instructions, and GCC generates the pleasantly concise:
tolower2: leal -65(%rdi), %ecx leal 32(%rdi), %edx movl %edi, %eax cmpl $25, %ecx cmovbe %edx, %eax ret
for our simple approach rather than the distinctly bloated:
tolower1: leal -65(%rdi), %eax cmpl $25, %eax setbe %al movzbl %al, %eax negl %eax andl $32, %eax orl %edi, %eax ret
All is not lost though for the hard-nosed hacker, for this is also an Age of Vectorization and we can do something similar using SSE2 instructions to lower-case not one but 16 characters at a time:
#include <x86intrin.h> void tolower128(char *p) { __m128i v1 = _mm_loadu_si128((__m128i*)p); __m128i v2 = _mm_cmplt_epi8(v1,_mm_set1_epi8('A')); __m128i v3 = _mm_cmpgt_epi8(v1,_mm_set1_epi8('Z')); __m128i v4 = _mm_or_si128(v2,v3); __m128i v5 = _mm_andnot_si128(v4,_mm_set1_epi8(0x20)); _mm_storeu_si128((__m128i*)p, _mm_or_si128(v1,v5)); }
Now everyone should be happy!