tolower

In an Age of Highly Pipelined RISC Architectures, we might be tempted to write a basic ASCII lower-case function as:

int tolower1(int c)
{
  return c|-(c-0101U<032)&040;
}

and avoid any branching. We subtract ‘A’ from c and do an unsigned comparison with ‘Z’-‘A’, convert to a bitmask and use that to ‘or’ in the lower case bit where necessary.

A more quiche-eating approach would be the clear and obvious:

int tolower2(int c)
{
  if (c >='A' && c <= 'Z') return c+'a'-'A';
  else return c;
}

at which hard-nosed hacker types will sneer, however, an Age of Highly Pipelined RISC Architectures is also an Age of Conditional Move Instructions, and GCC generates the pleasantly concise:

tolower2:
	leal	-65(%rdi), %ecx
	leal	32(%rdi), %edx
	movl	%edi, %eax
	cmpl	$25, %ecx
	cmovbe	%edx, %eax
	ret

for our simple approach rather than the distinctly bloated:

tolower1:
	leal	-65(%rdi), %eax
	cmpl	$25, %eax
	setbe	%al
	movzbl	%al, %eax
	negl	%eax
	andl	$32, %eax
	orl	%edi, %eax
	ret

All is not lost though for the hard-nosed hacker, for this is also an Age of Vectorization and we can do something similar using SSE2 instructions to lower-case not one but 16 characters at a time:

#include <x86intrin.h>
void tolower128(char *p)
{
  __m128i v1 = _mm_loadu_si128((__m128i*)p);
  __m128i v2 = _mm_cmplt_epi8(v1,_mm_set1_epi8('A'));
  __m128i v3 = _mm_cmpgt_epi8(v1,_mm_set1_epi8('Z'));
  __m128i v4 = _mm_or_si128(v2,v3);
  __m128i v5 = _mm_andnot_si128(v4,_mm_set1_epi8(0x20));
  _mm_storeu_si128((__m128i*)p, _mm_or_si128(v1,v5));
}

Now everyone should be happy!

Advertisements