How to use SRP in OpenSSL

[Update 10/8/14: As Vakharia points out in the comments, there have been a couple of DoS-type problems found with the OpenSSL SRP code, fixed in OpenSSL 1.0.1i. There seems to be a problem with that version with uncertificated SRP connections, see: http://marc.info/?t=140745609300002&r=1&w=2, so you might need to patch for that to work]

SRP seems to be a much neglected protocol & there have even been noises about removing TLS-SRP from some versions of OpenSSL. This is a shame as SRP has many nice properties that are only more attractive after the Heartbleed scare, including forward secrecy and mutual authentication without PKI.

There are various reasons for that neglect including possible concerns over patents, though it’s not very clear what the issue here is, or even if there is one. One issue with using SRP in OpenSSL in particular is that the C API isn’t very well documented, so this is an attempt to improve that situation. This is what I have been able to glean from experimentation, reading various bits of source as well as miscellaneous bits and pieces from the web. It all works for me & seems sensible, but I’m not a crypto expert so caveat emptor applies even more than usual.

For further details on SRP and TLS-SRP see:

http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol
http://en.wikipedia.org/wiki/TLS-SRP

The relevant RFCs are:

http://tools.ietf.org/html/rfc2945
http://tools.ietf.org/html/rfc5054

Also some useful discussions at:

http://bert-hubert.blogspot.co.uk/2012/02/on-srp-some-implementation-notes-and.html
http://crypto.stackexchange.com/questions/8245/why-is-srp-not-widely-used

On to the implementation details. All code fragments taken from full program at https://github.com/matthewarcus/ssl-demo.

We will start with the server side. The crucial piece of server data for SRP is the password verifier. We can make a verifier file using the openssl srp command line tool:

$ openssl srp
Exactly one of the options -add, -delete, -modify -list must be specified.
usage: srp [args] [user] 

 -verbose        Talk alot while doing things
 -config file    A config file
 -name arg       The particular srp definition to use
 -srpvfile arg   The srp verifier file name
 -add            add an user and srp verifier
 -modify         modify the srp verifier of an existing user
 -delete         delete user from verifier file
 -list           list user
 -gn arg         g and N values to be used for new verifier
 -userinfo arg   additional info to be set for user
 -passin arg     input file pass phrase source
 -passout arg    output file pass phrase source
 -engine e         - use engine e, possibly a hardware device.
 -rand file:file:...
                 load the file (or the files in the directory) into
                 the random number generator

The -gn parameter requires explanation: the security of SRP (like Diffie-Hellman key exchange) is based on the hardness of the discrete logarithm problem, ie. given a large prime N and a co-prime generator g, given g^a mod N, it’s hard to determine a. g and N are fixed in advance and don’t have to be secret so it’s normal to use standard values that do not allow any of the known shortcuts for discrete logarithms – as defined for example in the appendix to RFC 5054 for particular bit sizes (1024 and 1536 are mandatory for TLS-SRP) & it’s the bit size that is given as the -gn argument:

$ touch password.srpv
$ openssl srp -srpvfile password.srpv -add -gn 1536 user
Enter pass phrase for user:
Verifying - Enter pass phrase for user:
$ cat password.srpv
V 2qYg0YhL6s9OsjpPt4eD4iIDB/SF.7pEGPLHVIsLvVD9wUU5tqngvzQUA7Uf6nAQtP.K
U4G.9yra1Ia4fkOrUbx2QjGVizyc.QcaCr83nIewI/ry57Vrgg6QQv2U6Z7ClC0Wig5yKH
BDu2Lfny1aEZy3i7oi3dTywvIxDeFCcS0UPIhUgpRnINZ5K2HJiz6TuofvIfYC2EMpD5Q8
PuZ8/fB62TvfFK7TN67cOCCJSroOukrrr/KScmoDZ/odfKUM
FRzonhUh8ApuhBf45xMzsX1Olm1
user
1536    

Note that the verifier file has to exist beforehand, even if empty.

Now we have a verifier file, we can load it in our server code. OpenSSL defines a handy SRP_VBASE type that can be used to store verifiers and we can use SRP_VBASE_init to load in the the verifier file we made earlier:

#include <openssl/srp.h>
static SRP_VBASE *srpData = NULL;
static const char *srpvfile = "password.srpv";

void setupSRPData(SSL_CTX *ctx)
{
  srpData = SRP_VBASE_new(NULL);
  srpData = SRP_VBASE_new(NULL);
  CHECK(srpData != NULL);
  if (SRP_VBASE_init(srpData, (char *)srpvfile) != 0) {
     // File failed to load...
  }
}

We can also make a verifier directly and store it ourselves in an SRP_VBASE structure (or elsewhere):

static const char *srpgroup = "1536";
static const char *username = "user";
static const char *password = "password";

void setupSRPData(SSL_CTX *ctx)
{
    ...
    // The structure to put the verifier data
    SRP_user_pwd *p =
       (SRP_user_pwd *)OPENSSL_malloc(sizeof(SRP_user_pwd));
    SRP_gN *gN = SRP_get_default_gN(srpgroup);
    CHECK(gN != NULL);
    // This check seems a bit pointless, but doesn't do harm.
    char *srpCheck = SRP_check_known_gN_param(gN->g, gN->N);
    CHECK(srpCheck != NULL);

    // Now create the verifier for the password.
    // We could get the password from the user at this point.
    BIGNUM *salt = NULL, *verifier = NULL;
    CHECK(SRP_create_verifier_BN(username, password, &salt,
                                 &verifier, gN->N, gN->g));
    // Copy into the SRP_user_pwd structure
    p->id = OPENSSL_strdup(username);
    p->g = gN->g; p->N = gN->N;
    p->s = salt; p->v = verifier;
    p->info = NULL;
    // And add in to VBASE stack of user data
    sk_SRP_user_pwd_push(srpData->users_pwd, p);
}

Once we are done with the srpData structure, we should free it:

  if (srpData != NULL) SRP_VBASE_free(srpData);

(I’ve checked this code with valgrind and all seems well – when compiled with -DPURIFY, OpenSSL is quite well-behaved under valgrind, though there always seems to be a few hundred bytes of still-reachable data left at the end).

Now we have our verifier data loaded, we can define a suitable callback function:

int srpServerCallback(SSL *s, int *ad, void *arg)
{
  (void)arg;
  char *srpusername = SSL_get_srp_username(s);
  CHECK(srpusername != NULL);
  // Get data for user
  SRP_user_pwd *p = SRP_VBASE_get_by_user(srpData,srpusername);
  if (p == NULL) {
    fprintf(stderr, "User %s doesn't exist\n", srpusername);
    return SSL3_AL_FATAL;
  }
  // Set verifier data
  CHECK(SSL_set_srp_server_param(s, p->N, p->g,
                                 p->s, p->v, NULL) == SSL_OK);
  return SSL_ERROR_NONE;
}

And tell OpenSSL to use it for SRP connections:

SSL_CTX_set_srp_username_callback(ctx, srpServerCallback);
...

For a simple demo, using a global variable for the srpData is adequate, we could pass it in as the callback argument (or another context object):

CHECK(SSL_CTX_set_srp_cb_arg(ctx, srpData) == SSL_OK);
...

You’ll have to read the code to find out what the ad parameter is for.

Note that for a non-existent user, we return a fatal error, which result in the connection terminating with a PSK identity not known alert. More secure behaviour (and suggested in the RFC) is probably to simulate authentication with a dummy user, with failure happening in the same way as if the password was wrong (there are some extra fields in the SRP_VBASE structure for helping with this but I haven’t tried to do that yet).

In fact, in the full program, the first time the SRP callback function is called, we don’t have the srpData loaded, and we return -1 from the callback to indicate this:

int srpServerCallback(SSL *s, int *ad, void *arg)
{
  ...
  // Simulate asynchronous loading of SRP data
  if (srpData == NULL) {
    doSRPData = true;
    return -1; // Not ready yet
  }
  ...
}

When the callback fails in this way, the handshake returns a WANT_X509_LOOKUP error and we handle that by loading the srpData at this point (the idea presumably is that we may want to load the SRP data asynchronously & we can do that before called SSL_accept again):

    int res = sslAccept(ssl);
    if (res == SSL_OK) {
      break;
    } else if (SSL_get_error(ssl,res) == SSL_ERROR_WANT_X509_LOOKUP
               && doSRPData) {
      setupSRPData(ctx);
      doSRPData = false;
      ...

A couple of small details: we don’t want to do client certificate verification for SRP connections, so turn it off:

  bool isSRP = SSL_get_srp_username(ssl) != NULL;
  bool verify = doVerify && !isSRP && !peerVerified(ssl);
  ...

Testing the SRP username seemed to be the neatest way of finding out if the connection is actually using SRP (there doesn’t seem to be a standard interface for this).

Also the non-certificate verified SRP ciphersuites aren’t included in the default list of ciphers (since they officially offer no authentication [Update 10/8/15: this should be fixed in OpenSSL 1.0.1i]):

$ openssl ciphers -v | grep SRP
SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(256)  Mac=SHA1
SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(256)  Mac=SHA1
SRP-DSS-3DES-EDE-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=3DES(168) Mac=SHA1
SRP-RSA-3DES-EDE-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=3DES(168) Mac=SHA1
SRP-DSS-AES-128-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(128)  Mac=SHA1
SRP-RSA-AES-128-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(128)  Mac=SHA1
$ openssl ciphers -v ALL | grep SRP
SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(256)  Mac=SHA1
SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(256)  Mac=SHA1
SRP-AES-256-CBC-SHA     SSLv3 Kx=SRP      Au=None Enc=AES(256)  Mac=SHA1
SRP-DSS-3DES-EDE-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=3DES(168) Mac=SHA1
SRP-RSA-3DES-EDE-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=3DES(168) Mac=SHA1
SRP-3DES-EDE-CBC-SHA    SSLv3 Kx=SRP      Au=None Enc=3DES(168) Mac=SHA1
SRP-DSS-AES-128-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(128)  Mac=SHA1
SRP-RSA-AES-128-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(128)  Mac=SHA1
SRP-AES-128-CBC-SHA     SSLv3 Kx=SRP      Au=None Enc=AES(128)  Mac=SHA1

so we will set available ciphers appropriately (NULL is also useful for testing, I don’t recommend it for actual use);

  const char *cipherlist = "ALL:NULL";
  ...
  CHECK(SSL_CTX_set_cipher_list(ctx,cipherlist) == SSL_OK);
  ...

That’s about it for the server, now on to the client, which fortunately is a good deal simpler.

Once again, we need to define a callback function:

const char *password = NULL;

char *srpCallback(SSL *ssl, void *arg)
{
  char *user = (char*)arg;
  if (password != NULL) {
    // Can be passed in on command line
    return OPENSSL_strdup(password);
  } else {
    ssize_t promptsize = 256;
    char prompt[promptsize];
    CHECK(snprintf(prompt, promptsize,
                   "Password for %s: ", user) < promptsize);
    char *pass = getpass(prompt);
    char *result = OPENSSL_strdup(pass);
    // getpass uses a static buffer, so clear it out after use.
    memset(pass,0,strlen(pass));
    return result;
  }
}

getpass is officially obsolete, but does the job here (and we will make a concession to real security and null out the password after it’s been returned – hopefully OpenSSL doesn’t hold on the duplicated password any longer than needed).

Now finish the set up:

  if (doSRP) {
    CHECK(SSL_CTX_set_srp_username(ctx, (char*)username));
    SSL_CTX_set_srp_cb_arg(ctx,(void*)username);
    SSL_CTX_set_srp_client_pwd_callback(ctx, srpCallback);
    ...

We need to set the username before the handshake as it’s included in the initial hello message. The password is only required if an SRP ciphersuite is actually negotiated.

Finally, we need to get the right ciphersuites. If we include SRP ciphers in the client hello, but no user name, we will get a fatal alert if the server wishes to use SRP (which it may well do if it hasn’t been configured with ECDHE):

$ openssl ciphers -v
ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH     Au=RSA  Enc=AESGCM(256) Mac=AEAD
ECDHE-ECDSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH     Au=ECDSA Enc=AESGCM(256) Mac=AEAD
ECDHE-RSA-AES256-SHA384 TLSv1.2 Kx=ECDH     Au=RSA  Enc=AES(256)  Mac=SHA384
ECDHE-ECDSA-AES256-SHA384 TLSv1.2 Kx=ECDH     Au=ECDSA Enc=AES(256)  Mac=SHA384
ECDHE-RSA-AES256-SHA    SSLv3 Kx=ECDH     Au=RSA  Enc=AES(256)  Mac=SHA1
ECDHE-ECDSA-AES256-SHA  SSLv3 Kx=ECDH     Au=ECDSA Enc=AES(256)  Mac=SHA1
SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=DSS  Enc=AES(256)  Mac=SHA1
SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(256)  Mac=SHA1
...

so it’s best to not request SRP unless we really want it:

    if (doSRP) {
      cipherlist = "SRP";
    } else {
      cipherlist = "DEFAULT:!SRP";
    }

Let’s see all this in action:

On the server side:

$ ./ssl_server -v --srp 5999
renegotiation: allowed
TLSv1.2: SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(256)  Mac=SHA1
Session ID: (null)
Session ID CTX: 49:4E:49:54
No peer certificates.
...

Normal client connection:

$ ./ssl_client -v --srp localhost:5999
Cipher suites:
SRP-DSS-AES-256-CBC-SHA
SRP-RSA-AES-256-CBC-SHA
SRP-AES-256-CBC-SHA
SRP-DSS-3DES-EDE-CBC-SHA
SRP-RSA-3DES-EDE-CBC-SHA
SRP-3DES-EDE-CBC-SHA
SRP-DSS-AES-128-CBC-SHA
SRP-RSA-AES-128-CBC-SHA
SRP-AES-128-CBC-SHA
Password for user: 
TLSv1.2: SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP      Au=RSA  Enc=AES(256)  Mac=SHA1
Session ID: 19:E6:71:49:6A:3B:B4:0F:AE:AD:66:86:0A:87:55:37:5C:6B:DC:51:D9:89:12:CF:45:5E:A5:12:D8:91:42:CC
Session ID CTX: (null)
Peer certificates:
0: Subject: C = GB, O = TEST SERVER
   Issuer:  C = GB, O = TEST CA
1: Subject: C = GB, O = TEST CA
   Issuer:  C = GB, O = TEST ROOT
Certificate OK
...

Invalid user:

$ ./ssl_client -v --srp --user invalid localhost:5999
Cipher suites:
SRP-DSS-AES-256-CBC-SHA
SRP-RSA-AES-256-CBC-SHA
SRP-AES-256-CBC-SHA
SRP-DSS-3DES-EDE-CBC-SHA
SRP-RSA-3DES-EDE-CBC-SHA
SRP-3DES-EDE-CBC-SHA
SRP-DSS-AES-128-CBC-SHA
SRP-RSA-AES-128-CBC-SHA
SRP-AES-128-CBC-SHA
SSL3 alert read: fatal: unknown PSK identity
'sslConnect(ssl) == SSL_OK' failed: ssl_client.cpp:288
140613521352384:error:1407745B:SSL routines:SSL23_GET_SERVER_HELLO:reason(1115):s23_clnt.c:779:

Finally, as mentioned above, the standard SRP ciphersuites also do certificate-based server authentication – this seems sensible, if the verifier hasn’t been compromised, then SRP guarantees the authenticity of the server, but the incentives for the server to protect its private key are much greater than for it to protect the verifier for a particular user. In the case that we trust the server not to leak the verifier (perhaps we manage it ourselves and are accessing it remotely), we can use the non-PKI SRP ciphersuites by explicitly requesting them from the client:

$ ./ssl_client -v --srp --user user --cipherlist SRP-AES-256-CBC-SHA localhost:5999
Cipher suites:
SRP-AES-256-CBC-SHA
Password for user: 
TLSv1.2: SRP-AES-256-CBC-SHA     SSLv3 Kx=SRP      Au=None Enc=AES(256)  Mac=SHA1
Session ID: E9:F3:09:F9:7E:F8:43:60:0D:43:43:93:11:63:AE:F2:51:9F:4C:A9:4F:19:F8:89:DF:4F:07:02:66:42:F2:E6
Session ID CTX: (null)
No peer certificates.
...

valof

I’ve never written much BCPL, but it always seemed like a nice language. Many features ended up in C of course, but one that didn’t was VALOF, allowing the inclusion of a statement block in an expression, and using RESULTIS to return a value. Using slightly invented syntax:

LET B = VALOF $(
        FOR I = 1 TO J DO $(
                IF I*I = J THEN RESULTIS 1
        $)
        RESULTIS 0
$)

GCC defines the block-expression extension, allowing us to write:

int main(int argc, char *argv[])
{
  int j = atoi(argv[1]);
  printf("%d\n";,
         ({
           bool found = false;
           for (int i = 1; i &lt; j; i++) {
             if (i*i == j) {
               found = true;
               goto end;
             }
           }
         end:
           found;
         }));
}

but having to use the last expression in the block as the return value can get a bit convoluted.

With C++11, we can do better with immediately applied lambda expressions, with return taking the role of RESULTIS:

int main(int argc, char *argv[])
{
  int j = atoi(argv[1]);
  printf("%d\n";,
         [=](){
           for (int i = 1; i &lt; j; i++) {
             if (i*i == j) return true;
           }
           return false;
         }());
}

If we don’t like the smiley syntax, we can even define a VALOF macro:

#define VALOF(e) [=](){e}()
int main(int argc, char *argv[])
{
  int j = atoi(argv[1]);
  printf("%d\n";,
         VALOF(
             for (int i = 1; i &lt; j; i++) {
               if (i*i == j) return true;
             }
             return false;
           ));
}

though we’ll have to be a little bit careful with commas not enclosed in parentheses (and we might end up needing different macros for different variable capture methods).

Curiously, this idiom seems to be popular in Javascript.


IPv6 TUN reflector

We had a look as a simple network simulation using TUN a couple of posts ago: https://matthewarcus.wordpress.com/2013/05/18/fun-with-tun/.

Let’s now have a look at getting it all working with IPv6. Changing the address swapping code is fairly straightforward, and for some extra points we’ll add a facility for printing out the source and destination addresses of each packet forwarded, the correct function for doing this is now inet_ntop, which works for both v4 and v6 addresses.

Here are the main changes [see https://github.com/matthewarcus/stuff/tree/master/tun for full code]:

It’s convenient to define a 32-bit swap function:

void swap32(uint8_t *p, uint8_t *q)
{
  uint32_t t = get32(p);
  put32(p,get32(q));
  put32(q,t);
}

and our main function is now:

#define SRC_OFFSET4 12
#define DST_OFFSET4 16
#define SRC_OFFSET6 8
#define DST_OFFSET6 24

void reflect(uint8_t *p, size_t nbytes)
{
  uint8_t version = p[0] >> 4;
  switch (version) {
  case 4:
    if (verbosity > 0) {
      char fromaddr[INET_ADDRSTRLEN];
      char toaddr[INET_ADDRSTRLEN];
      inet_ntop(AF_INET, p+SRC_OFFSET4, fromaddr, sizeof(fromaddr));
      inet_ntop(AF_INET, p+DST_OFFSET4, toaddr, sizeof(toaddr));
      printf("%zu: %s->%s\n", nbytes, fromaddr, toaddr);
    }
    // Swap source and dest of an IPv4 packet
    // No checksum recalculation is necessary
    swap32(p+SRC_OFFSET4,p+DST_OFFSET4);
    break;
  case 6:
    if (verbosity > 0) {
      char fromaddr[INET6_ADDRSTRLEN];
      char toaddr[INET6_ADDRSTRLEN];
      inet_ntop(AF_INET6, p+SRC_OFFSET6, fromaddr, sizeof(fromaddr));
      inet_ntop(AF_INET6, p+DST_OFFSET6, toaddr, sizeof(toaddr));
      printf("%zu: %s->%s\n", nbytes, fromaddr, toaddr);
    }
    // Swap source and dest of an IPv6 packet
    // No checksum recalculation is necessary
    for (int i = 0; i < 4; i++) {
      swap32(p+SRC_OFFSET6+4*i,p+DST_OFFSET6+4*i);
    }
    break;
  default:
    fprintf(stderr, "Unknown protocol %u\n", version);
    exit(0);
  }
}

Setting up the the v6 addresses for our new interface is a little different. As before, we bring the interface up:

$ ip link set tun0 up

Now, we need to add a link-local address, mandatory for all IPv6 interfaces:

$ ip -6 addr add fe80::1/64 dev tun0

We can use ping6 to try this out, localizing the request to the tun0 interface:

$ ping6 -I tun0 fe80::a617:31ff:fe5a:334f
PING fe80::a617:31ff:fe5a:334f(fe80::a617:31ff:fe5a:334f) from fe80::1 tun0: 56 data bytes
64 bytes from fe80::a617:31ff:fe5a:334f: icmp_seq=1 ttl=64 time=0.164 ms
...

This address is also the link-local address of my Wifi interface, but there is no ambiguity as we must specify which interface to use.

We can also add a private network address. IPv6 does not have the same concept of a private network as IPv4, instead we define Unique Local Addresses: append 0xfd to a random 10 digit hex global id and add an arbitrary 4 digit subnet identifier. Any random global id is fine – the idea is to ensure that any given network will have a different id from any other private network it is likely to come in contact with – we don’t need to worry about true global uniqueness though the Birthday Paradox tells us that we are likely to have a potential conflict with only about a million private networks (there might be lots of people out there with the same name and birthday as you, but you are unlikely to meet one of them at random).

We can generate our own random address, for example, using the method described in RFC4193, or use /dev/random:

$ hexdump -v -e '/1 "%02x"' -n 5 /dev/urandom; echo
2acd2c8bc4

or just copy a random sequence from somewhere on the Internet, for example, the one used here:

$ ip -6 route add fd2a:cd2c:8bc4:0::/64 dev tun0

This adds a local network with a global id of 2acd2c8bc4 and a subnet id of 0.

We can also define a larger subnet:

$ ip -6 route add fd2a:cd2c:8bc4:1100::/56 dev tun0

Now traffic to any IPv6 address of form fd2a:cd2c:8bc4:11xx:… will be sent to our TUN device:

$ ping6 fd2a:cd2c:8bc4:11ff::23
PING fd2a:cd2c:8bc4:11ff::23(fd2a:cd2c:8bc4:11ff::23) 56 data bytes
64 bytes from fd2a:cd2c:8bc4:11ff::23: icmp_seq=1 ttl=64 time=0.110 ms
...

Indeed, we can define all subnets for another global id:

$ hexdump -e '/1 "%02x"' -n 5 /dev/urandom; echo
40bd2f7ba0
$ sudo ip -6 route add fd40:bd2f:7ba0::/48 dev tun0

Just for interest, here’s our entire IPv6 routing table:

$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
fd2a:cd2c:8bc4::/64 :: U 1024 0 0 tun0
fd2a:cd2c:8bc4:1100::/56 :: U 1024 0 0 tun0
fd40:bd2f:7ba0::/48 :: U 1024 0 0 tun0
fe80::/64 :: U 256 0 0 wlan0
fe80::/64 :: U 256 0 0 tun0
::/0 :: !n -1 1 524 lo
::1/128 :: Un 0 1 35 lo
fe80::1/128 :: Un 0 1 10 lo
fe80::a617:31ff:fe5a:334f/128 :: Un 0 1 7 lo
ff00::/8 :: U 256 0 0 wlan0
ff00::/8 :: U 256 0 0 tun0
::/0 :: !n -1 1 524 lo

Finally, to set up a simple service to use IPv6:

In one terminal:

$ nc -l -6 9901
...

In another:

$ nc -6 fd2a:cd2c:8bc4:11ff::23 9901
...

and our logging now looks like this:

$ ./reflect -v
Capability CAP_NET_ADMIN: 1 0 1
Created tun device tun0
48: fe80::1->ff02::2
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
48: fe80::1->ff02::2
48: fe80::1->ff02::2
80: fe80::1->fd2a:cd2c:8bc4:11ff::23
80: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
79: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23
72: fe80::1->fd2a:cd2c:8bc4:11ff::23

Those ff02::2 addresses are for IPv6 router discovery. The rest are two TCP flows, one in each direction (we can get more detail from Wireshark, in particular, the relevant port numbers, but this gives the general idea).


Fast inverse square root

What could be cuter than the infamous fast inverse square root function used in the Quake 3 engine:

http://en.wikipedia.org/wiki/Fast_inverse_square_root

The interesting part is the calculation of the initial approximation, splitting this down into the basic steps, we have:

float isqrt(float f)
{
  uint32_t n = floattoint(f);
  n >>= 1;
  n = -n;
  n += 0x5f000000;
  n += 0x3759df;
  float x = inttofloat(n);
  return x;
}

To get some insight into what’s going on here, we need to look at the representation of floating point numbers. An IEEE-754 float consists of a sign bit, an exponent value, and a mantissa. The exponent is an 8-bit value, the mantissa has 23 bits, both unsigned. As usual, a suitable notation is key: simplifying a little (specifically, ignoring NaNs, infinities and denormalized numbers), we shall write a float of the form {sign:1;exponent:8;mantissa:23} as (s,e,m), with m a real in the range [0,1), and this represents 2e-127(1+m), negated if the sign bit is 1.

To warm up, it’s helpful to look at a simpler example:

float fsqrt(float f)
{
  unsigned n = floattoint(f);
  n >>=1; 
  n += 0x1f800000;
  n += 0x400000;
  return inttofloat(n);
}

Here we are computing an approximation to a normal square root.

Taking it a step at a time: first the shift right n >>= 1, there are two cases for odd and even exponent:

(0,2e,m)   => (0,e,m/2)        // No carry from exponent
(0,2e+1,m) => (0,e,1/2 + m/2)  // Carry into mantissa

For n += 0x1f800000: we are adding 63 (0x1f800000 is 63 << 23) onto the exponent:

(s,e,m) => (s,e+63,m)

And finally, n += 0x400000: Generally, if we add p onto the mantissa, where 0 <= p < 223, and writing m’ = m + p/223, we have:

(s,e,m) => (s,e,m') if m' < 1
(s,e,m) => (s,e+1,m'-1) if m' >= 1

For p = 0x400000 = 222, we have m' = m+1/2. ie:

(s,e,m) => (s,e+1,m'-1) if m' >= 1 
        => (s,e,m') otherwise 

Putting this together, for even powers of two:

22k(1+m) => 2k(1+m/2):

(0,2k+127,m)   => (0,k+63,0.5+m/2) 
               => (0,k+126,0.5+m/2)
               => (0,k+127,m/2)

And for odd powers:

2(2k+1)(1+m) => 2k(1+0.5+m/2):

(0,2k+1+127,m) => (0,k+64,m/2)
               => (0,k+127,m/2)
               => (0,k+127,0.5+m/2)

Let’s check this is sensible by setting m = 0:

22k     => 2k
2(2k+1) => 2k(1+0.5)

and putting m = 1 we get:

22k+1   => 2k(1+1/2)
2(2k+2) => 2k+1

Our approximation is linear in between powers of two, and continuous at those points too. Also at even powers the graph is tangent to sqrt(x).

This is all nicely illustrated in a picture:

Returning to the original inverse function, we have an additional negation step, n = -n: to negate a twos-complement number, we flip the bits and add one. There are two main cases, depending on whether the mantissa is zero. If it is zero, there is a carry into the exponent, otherwise we just flip the bits of the exponent. The sign bit will always end up set (I’m ignoring the case when the exponent is zero). The general effect is:

(0,e,0) => (1,256-e,0)
(0,e,m) => (1,255-e,1-m)

This time, we are adding 190 onto the exponent (0x5f000000 = 190<<23) – this has the dual effect of resetting the sign bit to 0 and subtracting 66 from the exponent (190 = -66 mod 256).

Let’s see what happens with odd and even powers of two; writing the magic offset added onto the mantissa as c:

(0,2k+127,m) => (0,k+63,0.5+m/2)     // n >>= 1
             => (1,255-k-63,0.5-m/2) // n = -n
             => (0,126-k,0.5-m/2)    // n += 0x5f000000
             => (0,126-k,0.5-m/2+c)  // if 0.5-m/2+c < 1
             => (0,127-k,-0.5-m/2+c) // if 0.5-m/2+c >= 1
(0,2k+128,m) => (0,k+63,m/2)       // n >>= 1
             => (1,255-k-63,1-m/2) // n = -n
             => (0,126-k,1-m/2)    // n += 0x5f000000
             => (0,126-k,1-m/2+c)  // if 1-m/2+c < 1
             => (0,127-k,-m/2+c)   // if 1-m/2+c >= 1

If we use 0x400000 as the offset, ie. c above is 0.5, and put m=0 in the two cases, we have:

22k => 1/2k
22k+1 => 1.5/2k+1

Once again, our approximation coincides exactly at even powers of two, and as before it’s useful to have a picture:

We don’t have nice tangents this time, but the end result isn’t too bad.

We probably could have saved ourselves some work here by noting that the mapping between 32-bit integers (as signed magnitude numbers) and the corresponding floats is monotonic and continuous (for some sense of continuous), so composing with other (anti)monotonic operations gives an (anti)monotonic result, so having checked our function is correct at powers of two, it can’t go too badly adrift in between.

We can improve our approximation by using a smaller mantissa offset, 0x3759df, and we end up with the function we came in with:

Not especially pretty, but a good starting point for some Newton-Raphson. Notice that as well as kinks at exact powers of two, this approximation has kinks in between as well (when adding the constant to the mantissa overflows into the exponent).


Embedded Python Interpreter

And now for something completely different…

Often, I’d like to embed a reasonably capable command interpreter in a C++ application. Python seems a likely candidate, so here’s some investigative code using separate processes (the next step will be to use threads, if that’s possible, so the interpreter can live in the same memory space as our application, that can wait for part II though). As well as the mechanics of embedding Python, we have a pleasant excursion through the sometimes murky worlds of signal handling and pseudo-terminals.

The server structure is conventional (though not necessarily suitable for a serious production server), on each incoming connection we fork a handler process, this in turn splits into two processes, which form their own process group under the control of a pseudo-terminal (pty). One forwarding process copies data between the socket and the master side of the pty, the other process runs the interpreter itself on the slave side. Simple enough, with a few subtleties. To get signal handling right, we have to ignore SIGINT in the forwarding process (otherwise it will terminate on interrupt, taking the interpreter with it), but leave the default handler in the interpreter process – Python sets up its own signal handler, but it only seems to do this if the handler hasn’t been redefined already. Also, Python seems to insist that it uses fds 0,1 and 2 so we need to rebind them, and, finally, to get Python to do line editing, we need to import readline in the interpreter.

My main interest here is in getting external access to the interpreter, rather than the mechanics of calling between C and Python, so we just have a couple of simple functions init() and func() defined in the embedded interpreter as examples. At this simple level I don’t think we need to worry about reference counts etc.

#include <Python.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <time.h>
#include <errno.h>
#include <netinet/ip.h>
#include <sys/epoll.h>

// Some handy macros to help with error checking
// When prototyping, it's a good idea to check every
// system call for errors, these macros help to keep
// the code uncluttered.

#define CHECK(e) \
 ((e)? \
  (void)0: \
  (fprintf(stderr, "'%s' failed at %s:%d\n - %s\n", \
           #e, __FILE__, __LINE__,strerror(errno)), \
   exit(0)))

#define CHECKSYS(e) (CHECK((e)==0))
#define CHECKFD(e) (CHECK((e)>=0))

// We are told not to use signal, due to portability problems
// so we will define a similar function ourselves with sigaction
void setsignal(int signal, sighandler_t handler)
{
  struct sigaction sa;
  memset(&sa,0,sizeof(sa));
  sa.sa_handler = handler;
  CHECKSYS(sigaction(signal,&sa,NULL));
}

// Make a suitable server socket, as a small concession to
// security, we will hardwire the loopback address as the
// bind address. People elsewhere can come in through an SSH
// tunnel.
int makeserversock(int port)
{
  int serversock = socket(AF_INET,SOCK_STREAM,0);
  CHECKFD(serversock);
  sockaddr_in saddr;
  saddr.sin_family = PF_INET;
  saddr.sin_port = htons(port);
  saddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);

  int optval = 1;
  CHECKSYS(setsockopt(serversock, SOL_SOCKET, SO_REUSEADDR, 
                      &optval, sizeof optval));
  CHECKSYS(bind(serversock,(sockaddr*)&saddr,sizeof(saddr)));
  CHECKSYS(listen(serversock,10));
  return serversock;
}

// Copy data between our socket fd and the master
// side of the pty. A simple epoll loop.
int runforwarder(int mpty, int sockfd)
{
  static const int MAX_EVENTS = 10;
  int epollfd = epoll_create(MAX_EVENTS);
  CHECKFD(epollfd);
  epoll_event event;
  memset (&event, 0, sizeof(event));
  event.events = EPOLLIN;
  event.data.fd = sockfd;
  CHECKSYS(epoll_ctl(epollfd, EPOLL_CTL_ADD, sockfd, &event));
  event.data.fd = mpty;
  CHECKSYS(epoll_ctl(epollfd, EPOLL_CTL_ADD, mpty, &event));
  char ibuff[256];
  while (true) {
    struct epoll_event events[MAX_EVENTS];
    int nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
    // Maybe treat EINTR specially here.
    CHECK(nfds >= 0);
    for (int i = 0; i < nfds; ++i) {
      int fd = events[i].data.fd;
      if (events[i].events & EPOLLIN) {
        ssize_t nread = read(fd,ibuff,sizeof(ibuff));
        CHECK(nread >= 0);
        if (nread == 0) {
          goto finish;
        } else {
          write(mpty+sockfd-fd,ibuff,nread);
        }
      } else if (events[i].events & (EPOLLERR|EPOLLHUP)) {
        goto finish;
      } else {
        fprintf(stderr, "Unexpected event for %d: 0x%x\n", 
                fd, events[i].events);
        goto finish;
      }
    }
  }
 finish:
  CHECKSYS(close(mpty));
  CHECKSYS(close(sockfd));
  CHECKSYS(close(epollfd));
  return 0;
}

// The "application" functions to be accessible from
// the embedded interpreter
int myinit()
{
  srand(time(NULL));
  return 0;
}

int myfunc()
{
  return rand();
}

// Python wrappers around our application functions
static PyObject*
emb_init(PyObject *self, PyObject *args)
{
    if (!PyArg_ParseTuple(args, ":init")) return NULL;
    return Py_BuildValue("i", myinit());
}

static PyObject*
emb_func(PyObject *self, PyObject *args)
{
    if (!PyArg_ParseTuple(args, ":func")) return NULL;
    return Py_BuildValue("i", myfunc());
}

static PyMethodDef EmbMethods[] = {
    {"init", emb_init, METH_VARARGS,
     "(Re)initialize the application."},
    {"func", emb_func, METH_VARARGS,
     "Run the application"},
    {NULL, NULL, 0, NULL}
};

int runinterpreter(char *argname, int fd)
{
  CHECKFD(dup2(fd,0));
  CHECKFD(dup2(fd,1));
  CHECKFD(dup2(fd,2));
  CHECKSYS(close(fd)); 

  Py_SetProgramName(argname);
  Py_Initialize();
  Py_InitModule("emb", EmbMethods);
  PyRun_SimpleString("from time import time,ctime\n");
  PyRun_SimpleString("from emb import init,func\n");
  PyRun_SimpleString("print('Today is',ctime(time()))\n");
  PyRun_SimpleString("import readline\n");
  PyRun_InteractiveLoop(stdin, "-");
  Py_Finalize();

  return 0;
}

int main(int argc, char *argv[])
{
  int port = -1;
  if (argc > 1) {
    port = atoi(argv[1]);
  } else {
    fprintf(stderr, "Usage: %s <port>\n", argv[0]);
    exit(0);
  }
  setsignal(SIGCHLD, SIG_IGN);
  int serversock = makeserversock(port);
  while (true) {
    int sockfd = accept(serversock,NULL,NULL);
    CHECKFD(sockfd);
    if (fork() != 0) {
      // Server side, close new connection and continue
      CHECKSYS(close(sockfd));
    } else {
      // Client side, close server socket
      CHECKSYS(close(serversock)); serversock = -1;
       // Create a pseudo-terminal
      int mpty = posix_openpt(O_RDWR);
      CHECKFD(mpty);
      CHECKSYS(grantpt(mpty)); // pty magic
      CHECKSYS(unlockpt(mpty));
      // Start our own session
      CHECK(setsid()>0); 
      int spty = open(ptsname(mpty),O_RDWR);
      // spty is now our controlling terminal
      CHECKFD(spty);
      // Now split into two processes, one copying data
      // between socket and pty; the other running the
      // actual interpreter.
      if (fork() != 0) {
        CHECKSYS(close(spty));
        // Ignore sigint here
        setsignal(SIGINT, SIG_IGN);
        return runforwarder(sockfd,mpty);
      } else {
        CHECKSYS(close(sockfd));
        CHECKSYS(close(mpty)); 
        // Default sigint here - will be replace by interpreter
        setsignal(SIGINT, SIG_DFL);
        return runinterpreter(argv[0],spty);
      }
    }
  }
}

Compilation needs something like:

g++ -g -L/usr/lib/python2.6/config -lpython2.6 -I/usr/include/python2.6 -Wall embed.cpp -o embed

Suitable flags can be obtained by doing:

	/usr/bin/python2.6-config --cflags
	/usr/bin/python2.6-config --ldflags

Of course, all this will depend on your exact Python version and where it is installed. Embedding has changed somewhat in Python 3, but most of this will still apply.

To connect to the interpreter, we can use our good friend netcat, with some extra tty mangling (we want eg. control-C to be handled by the pty defined above in the server code, not the user terminal, so we put that into raw mode).

#!/bin/sh
ttystate=`stty --save`
stty raw -echo
netcat $*
stty $ttystate

We set up the server socket to only listen on the loopback interface, so in order to have secure remote access, we can set up an SSH tunnel by running something like:

$ ssh -N -L 9998:localhost:9999 <serverhost>

on the client host.

Finally, we can run some Python:

$ connect localhost 9998
('Today is', 'Sun Nov  4 21:09:09 2012')
>>> print 1
1
>>> init()
0
>>> func()
191482566
>>> ^C
KeyboardInterrupt
>>> ^D
$