# My First Shadertoy Shader

**Posted:**December 5, 2017

**Filed under:**Uncategorized Leave a comment

My first Shadertoy shader:

https://www.shadertoy.com/embed/4lfBzj?gui=true&t=10&paused=false&muted=true

# Drawing the Clebsch Surface as Particles

**Posted:**April 6, 2016

**Filed under:**Geometry, Graphics, Javascript Leave a comment

http://matthewarcus.github.io/polyjs/clebsch.html

https://github.com/matthewarcus/polyjs/blob/master/js/clebsch.js

Instead of using a triangulated mesh, we can display a surface in 3d by simply generating a set of random points on the surface and displaying them as a sort of particle system. Let’s do this with the famous Clebsch cubic surface: the points are stored as homogeneous coordinates, to display we multiply by a quaternion to rotate in projective space before doing the usual perspective projection to Euclidean space.

The Clebsch surface is the set of points `(x0,x1,x2,x3,x4)`

(in projective 4-space) that satisfy the equations:

`x0 + x1 + x2 + x3 + x4 = 0`

x0^{3} + x1^{3} + x2^{3} + x3^{3} + x4^{3} = 0

To simplify things, we can eliminate `x0 (= x1 + x2 + x3 + x4)`

and rename a little, to get the single cubic equation:

`(x + y + z + w)`

^{3} = x^{3} + y^{3} + z^{3} + w^{3} [***]

defining a surface in projective 3-space, with the familiar 4-element homogeneous coordinates.

Since coordinates are homogeneous, we can just consider the cases of `w = 1`

and `w = 0`

(plane at infinity), but for `w = 0`

, it turns out the solutions are some of the 27 lines which we shall later draw separately, so for now just consider the case `w = 1 `

for which we have:

`(x + y + z + 1)`

^{3} = x^{3} + y^{3} + z^{3} + 1

and given values for `x`

and `y`

, we can solve for `z`

easily – the cubes drop out and we just have a quadratic equation that can be solved in the usual way:

`3Az`

^{2} + 3A^{2}z + A^{3} - B = 0 where A = x+y+1, B = x^{3} + y^{3} + 1

We can now generate points on the surface by randomly choosing `x`

and `y`

and solving for `z`

to give a set of homogeneous points `(x,y,z,w)`

satisfying `[***]`

and we can get further solutions by permuting the coordinates. We don’t need all permutations since some of the coordinates are arbitrary, and points that are multiples of each other are equivalent. The random points themselves are generated by this Javascript function, that generates points between `-Infinity`

and `+Infinity`

, but clustered around the origin.

function randpoint() { var x = 1/(2*Math.random()-1); if (x < 0) x -= 1; if (x > 0) x += 1; return x; }

The Clebsch surface of course is famous for its 27 lines, so we draw these in as well, also as random selection of points rather than a solid line. 15 lines are points of the form `(a,-a,b,-b,0)`

and permutations – since we are working in 4-space, this becomes 12 lines of form `(a,-a,b,0)`

and three of form `(a,-a,b,-b)`

. These 15 lines are drawn in white and can be seen to intersect in 10 Eckardt points where 3 lines meet (though it’s hard to find a projection where all 10 are simultaneously visible). The other 12 lines are of the form `(a,b,-(φa+b),-(a+φb),1)`

where φ is the golden ratio, 1.618.. and can be seen to form Schläfli’s “Double Six” configuration – each magenta or cyan line intersects with exactly 5 other lines, all of the opposite color.

All that remains is to project into 3-space – as usual we divide by the w-coordinate, but to get different projections, before doing this we rotate in projective space by multiplying by a quaternion & then varying the quaternion varies the projection. (Quaternion `(d,-a,-b,-c)`

puts plane `(a,b,c,d)`

at infinity – or alternatively, rotates `(a,b,c,d)`

to `(0,0,0,1)`

– it is well known that quaternions can be used to represent rotations in 3-space, but they also work for 4-space (with 3-space as a special case) – a 4-space rotation is uniquely represented (up to sign) by `x -> pxq`

where p and q are unit quaternions). Here we multiply each point by a single quaternion to give an isoclinic or Clifford rotation – every point is rotated by the same angle.

We are using Three.js, which doesn’t seem to accept 4d points in geometries – we could write our own vertex shader to do the rotation and projection on the GPU, but for now, we do it on the CPU; updating the point positions is reasonably fast with the Three.js BufferGeometry. Actually displaying the points is simple with the THREE.Points object – we use a simple disc texture to make things a little more interesting, and attempt to color the points according to the permutations used to generate them.

The mouse and arrow keys control the camera position, square brackets move through different display modes, space toggles the rotation.

An excellent reference giving details of the construction of the surface (and good French practise) is:

http://www.mathcurve.com/surfaces/clebsch/clebsch.shtml

# Excellent Numbers

**Posted:**January 16, 2016

**Filed under:**Number Theory, Python Leave a comment

A number `n`

, with an even number of digits, is excellent if it can be split into two halves, `a`

and `b`

, such that `b`

. Let ^{2} - a^{2} = n`2k`

be the number of digits, then we want `n = aA + b = b`

.^{2} - a^{2} where A = 10^{k}

Let’s do some algebra:

`aA + b = b`

^{2} - a^{2}

a^{2} + aA = b^{2} - b # Rearrange

4a^{2} + 4aA = 4b^{2} - 4b # Multiply by 4

(2a + A)^{2} - A^{2} = (2b - 1)^{2} - 1 # Complete the square

Now we can substitute `X = 2a + A`

, `Y = 2b - 1`

, `N = A`

and rearranging a little, we have:^{2} - 1

`X`

^{2} - Y^{2} = N

(X - Y)(X + Y) = N

So, every `2k`

digit excellent number gives rise to divisors `i,j`

of `N`

where `ij = N`

and `i <= j`

This process can be reversed: if `i`

is a divisor of `N`

, with `j = N/i`

and `i <= j`

, we have `X = (j+i)/2`

, `Y = (j-i)/2`

, then `a = (X-A)/2`

and `b = (Y+1)/2`

. If all the divisions by 2 are exact (and in this case they are – `N`

is odd, so `i`

and `j`

are too, also writing `i = 2i'+1`

and `j = 2j'+1`

, we can show that `i'`

and `j'`

must have different parities) then we have a potentially excellent number – all we need to check is that `a`

has exactly `k`

digits and that `b`

has at most `k`

(otherwise `a`

and `b`

are not the upper and lower halves of a `2k`

digit number).

Now we have a nice algorithm: find all divisors `i`

of `N = 10`

, with ^{k}-1`i <= sqrt(N)`

, find `a`

and `b`

as above and check if they are in the appropriate range, if so, we have an excellent number and it should be clear that all excellent numbers can be generated in this way.

For small `N`

, we can find all divisors just by a linear scan, but for larger `N`

something better is needed: given a prime factorization we can generate all possible combinations of the factors to get the divisors, so now all we need to do is factorize `10`

. This of course is a hard problem but we can use, for example, Python’s ^{2k}-1`primefac`

library, and give it some help by observing that `10`

. The factorization is harder for some values of ^{2k}-1 = (10^{k}-1)(10^{k}+1)`k`

, particularly if `k`

is prime, but we can always have a look at:

http://stdkmd.com/nrr/repunit/

if we run in to trouble. My Pi 2 gets stuck at `k = 71`

where 11, 290249, 241573142393627673576957439049, 45994811347886846310221728895223034301839 and 31321069464181068355415209323405389541706979493156189716729115659 are the factors needed, so it’s not surprising it is struggling. Also, the number of divisors to check is approximately `2`

where ^{n-1}`n`

is the number of prime factors, of which, for example `10`

has 35 so just generating all potential 180 digit numbers will take a while.^{90}-1

So, after all that, here’s some code. Using Python generators keeps the memory usage down – we can process each divisor as it is constructed, (though it does mean that results for a particular size don’t come out in order) – after running for around 24 hours on a Pi 2, we are up to 180 digits and around 2000000 numbers but `top`

reports less than 1% of memory in use.

import primefac def excellent(k): """Generate all excellent numbers of size 2k""" A = 10**k; N = A*A-1 factors1 = list(primefac.primefac(A-1)) factors2 = list(primefac.primefac(A+1)) d = divisors(sorted(factors1+factors2)) for i in d: if i*i > N: continue j = N//i x,y = (j+i)//2, (j-i)//2 a,b = (x-A)//2, (y+1)//2 if A//10 <= a < A and 0 <= b < A: n = a*A+b assert(n == b*b-a*a) # Check our logic yield n def divisors(factorlist): """Generate all divisors of number from list of prime factors""" factors = multiset(factorlist) nfactors = len(factors) a = [0]*nfactors; b = [1]*nfactors yield 1 while True: i = 0 while i < nfactors: if a[i] < factors[i][1]: break a[i] = 0; b[i] = 1; i += 1 if i == nfactors: break a[i] += 1; b[i] *= factors[i][0] for j in range(0,i): b[j] = b[i] yield b[0] def multiset(s): """Create a multiset from a (sorted) list of items""" m = []; n = s[0]; count = 1 for i in range(1,len(s)): if s[i] != n: m.append((n,count)) n = s[i] count = 1 else: count += 1 m.append((n,count)) return m for n in range(2,1000,2): for m in excellent(n//2): print m

The counts for numbers up to 10^{100}:

2: count = 1 factors = [3, 3, 11] 4: count = 1 factors = [3, 3, 11, 101] 6: count = 8 factors = [3, 3, 3, 7, 11, 13, 37] 8: count = 3 factors = [3, 3, 11, 73, 101, 137] 10: count = 3 factors = [3, 3, 11, 41, 271, 9091] 12: count = 13 factors = [3, 3, 3, 7, 11, 13, 37, 101, 9901] 14: count = 2 factors = [3, 3, 11, 239, 4649, 909091] 16: count = 3 factors = [3, 3, 11, 17, 73, 101, 137, 5882353] 18: count = 28 factors = [3, 3, 3, 3, 7, 11, 13, 19, 37, 52579, 333667] 20: count = 15 factors = [3, 3, 11, 41, 101, 271, 3541, 9091, 27961] 22: count = 9 factors = [3, 3, 11, 11, 23, 4093, 8779, 21649L, 513239L] 24: count = 51 factors = [3, 3, 3, 7, 11, 13, 37, 73, 101, 137, 9901, 99990001] 26: count = 5 factors = [3, 3, 11, 53, 79, 859, 265371653, 1058313049] 28: count = 17 factors = [3, 3, 11, 29, 101, 239, 281, 4649L, 909091L, 121499449] 30: count = 435 factors = [3, 3, 3, 7, 11, 13, 31, 37, 41, 211, 241, 271, 2161, 9091, 2906161] 32: count = 157 factors = [3, 3, 11, 17, 73, 101, 137, 353, 449, 641, 1409, 69857, 5882353] 34: count = 4 factors = [3, 3, 11, 103, 4013L, 2071723L, 5363222357L, 21993833369L] 36: count = 66 factors = [3, 3, 3, 3, 7, 11, 13, 19, 37, 101, 9901L, 52579L, 333667L, 999999000001L] 38: count = 2 factors = [3, 3, 11, 909090909090909091L, 1111111111111111111L] 40: count = 103 factors = [3, 3, 11, 41, 73, 101, 137, 271, 3541L, 9091L, 27961L, 1676321L, 5964848081L] 42: count = 999 factors = [3, 3, 3, 7, 7, 11, 13, 37, 43, 127, 239, 1933L, 2689L, 4649L, 459691L, 909091L, 10838689L] 44: count = 89 factors = [3, 3, 11, 11, 23, 89, 101, 4093L, 8779L, 21649L, 513239L, 1052788969L, 1056689261L] 46: count = 2 factors = [3, 3, 11, 47, 139, 2531L, 549797184491917L, 11111111111111111111111L] 48: count = 188 factors = [3, 3, 3, 7, 11, 13, 17, 37, 73, 101, 137, 9901L, 5882353L, 99990001L, 9999999900000001L] 50: count = 45 factors = [3, 3, 11, 41, 251, 271, 5051L, 9091L, 21401L, 25601L, 182521213001L, 78875943472201L] 52: count = 11 factors = [3, 3, 11, 53, 79, 101, 521, 859, 265371653L, 1058313049L, 1900381976777332243781L] 54: count = 150 factors = [3, 3, 3, 3, 3, 7, 11, 13, 19, 37, 757, 52579L, 333667L, 70541929L, 14175966169L, 440334654777631L] 56: count = 99 factors = [3, 3, 11, 29, 73, 101, 137, 239, 281, 4649L, 7841L, 909091L, 121499449L, 127522001020150503761L] 58: count = 2 factors = [3, 3, 11, 59, 3191L, 16763L, 43037L, 62003L, 77843839397L, 154083204930662557781201849L] 60: count = 35929 factors = [3, 3, 3, 7, 11, 13, 31, 37, 41, 61, 101, 211, 241, 271, 2161L, 3541L, 9091L, 9901L, 27961L, 2906161L, 4188901L, 39526741L] 62: count = 2 factors = [3, 3, 11, 2791L, 6943319L, 57336415063790604359L, 909090909090909090909090909091L] 64: count = 1162 factors = [3, 3, 11, 17, 73, 101, 137, 353, 449, 641, 1409L, 19841L, 69857L, 976193L, 5882353L, 6187457L, 834427406578561L] 66: count = 478 factors = [3, 3, 3, 7, 11, 11, 13, 23, 37, 67, 4093L, 8779L, 21649L, 513239L, 599144041L, 183411838171L, 1344628210313298373L] 68: count = 28 factors = [3, 3, 11, 101, 103, 4013L, 2071723L, 28559389L, 1491383821L, 5363222357L, 21993833369L, 2324557465671829L] 70: count = 146 factors = [3, 3, 11, 41, 71, 239, 271, 4649L, 9091L, 123551L, 909091L, 4147571L, 102598800232111471L, 265212793249617641L] 72: count = 3627 factors = [3, 3, 3, 3, 7, 11, 13, 19, 37, 73, 101, 137, 3169L, 9901L, 52579L, 98641L, 333667L, 99990001L, 999999000001L, 3199044596370769L] 74: count = 4 factors = [3, 3, 11, 7253L, 2028119L, 247629013L, 422650073734453L, 296557347313446299L, 2212394296770203368013L] 76: count = 5 factors = [3, 3, 11, 101, 722817036322379041L, 909090909090909091L, 1111111111111111111L, 1369778187490592461L] 78: count = 700 factors = [3, 3, 3, 7, 11, 13, 13, 37, 53, 79, 157, 859, 6397L, 216451L, 265371653L, 1058313049L, 388847808493L, 900900900900990990990991L] 80: count = 605 factors = [3, 3, 11, 17, 41, 73, 101, 137, 271, 3541L, 9091L, 27961L, 1676321L, 5070721L, 5882353L, 5964848081L, 19721061166646717498359681L] 82: count = 2 factors = [3, 3, 11, 83, 1231L, 538987L, 2670502781396266997L, 3404193829806058997303L, 201763709900322803748657942361L] 84: count = 59490 factors = [3, 3, 3, 7, 7, 11, 13, 29, 37, 43, 101, 127, 239, 281, 1933L, 2689L, 4649L, 9901L, 226549L, 459691L, 909091L, 10838689L, 121499449L, 4458192223320340849L] 86: count = 9 factors = [3, 3, 11, 173, 1527791L, 57009401L, 2182600451L, 1963506722254397L, 2140992015395526641L, 7306116556571817748755241L] 88: count = 105 factors = [3, 3, 11, 11, 23, 73, 89, 101, 137, 617, 4093L, 8779L, 21649L, 513239L, 1052788969L, 1056689261L, 16205834846012967584927082656402106953L] 90: count = 50344 factors = [3, 3, 3, 3, 7, 11, 13, 19, 31, 37, 41, 211, 241, 271, 2161L, 9091L, 29611L, 52579L, 238681L, 333667L, 2906161L, 3762091L, 8985695684401L, 4185502830133110721L] 92: count = 26 factors = [3, 3, 11, 47, 101, 139, 1289L, 2531L, 18371524594609L, 549797184491917L, 11111111111111111111111L, 4181003300071669867932658901L] 94: count = 2 factors = [3, 3, 11, 6299L, 35121409L, 4855067598095567L, 297262705009139006771611927L, 316362908763458525001406154038726382279L] 96: count = 80002 factors = [3, 3, 3, 7, 11, 13, 17, 37, 73, 97, 101, 137, 353, 449, 641, 1409L, 9901L, 69857L, 206209L, 5882353L, 99990001L, 66554101249L, 75118313082913L, 9999999900000001L] 98: count = 10 factors = [3, 3, 11, 197, 239, 4649L, 909091L, 505885997L, 1976730144598190963568023014679333L, 5076141624365532994918781726395939035533L] 100: count = 3573 factors = [3, 3, 11, 41, 101, 251, 271, 3541L, 5051L, 9091L, 21401L, 25601L, 27961L, 60101L, 7019801L, 182521213001L, 14103673319201L, 78875943472201L, 1680588011350901L]

Time to generate that lot, about 3m30s on my Core I3 laptop, about 30m on my Raspberry Pi 2.

Finally, if we want really big numbers, then we don’t need a full factorization – http://stdkmd.com/nrr/repunit/ has enough factors of 10^{2016}-1, for example, to find, among others:

467203616037752709753640875404905278610286278867781588976105221346970432 395736683257666616868811083043941463314513845761368180251295614288295272 712974920189045619317506423133721608367014923830041699210760183164585217 599174938750569917513095900320978144876083087591215818424979192187341459 756509047543186958692244904217361382312220589759551138455399864423950556 877618463844146927376784311673205223822619959320039184981861810037019868 259785305305318574127400789513920685551635257636719954377249042716215901 383844447790548649266546486400561808622649593166435681150190744136685886 335659851446510906275097594875425811830427470257238967118169107518206073 615596540306297513679739458887733205329861379807932402155440131595447258 905459620119681006553819769296088325096562295835757909167184772591564873 889571115660015460170065915133627063917081464432904541564462471704065596 995864597351005042459541065531684772723537478273137238536882143270837967 355784692820692700257668090096174851711901872379544776234666991080481050 827938907695794044434449897483410036041789283630321915114061710879582491 425436499562245749681317500307257068229179611956588865266983050266693593 782449462829670744802988126608915433835744545694648340583599198978087691 600147477867595689158751559170609174089828925445708502246431435332615649 355141085750085715940292846105899585176839916452603275591677348739914677 778216128911941208439459006047576543241292648992222090990416741035195043 278539082418243317317374583211686841192215307443355453487377377290350196 371354628663013301973808912217716856093563005467214390853254407353722627 416963492751125708925240306094459245276161787679224919637918913006752210 513662791886758732646236407566089976349268846643228836678505302621204788 560791609138040737880910308235667105956827669559476643173829425259196119 155907391268256981917731226756506952400793923896521065760681749285568778 344719401424538913519492510286853888028737195080140956569785690813785590 037948199410250551049984142378021668001361835139075663440830359075663125

Which should be enough excellence for anybody. If not, https://www.dropbox.com/s/9xdnxd0ifla0zhf/excellent100.txt.gz has a list of all numbers up to 100 digits.

# Drawing Uniform Polyhedra with Javascript, WebGL and Three.js

**Posted:**June 16, 2015

**Filed under:**Geometry, Graphics, Javascript Leave a comment

I’ve been meaning to write this up properly for a long time, but for now, this link will have to do:

http://matthewarcus.github.io/polyjs/

The idea is to use some fairly straightforward vector geometry to generate uniform polyhedra and their derivatives, using the kaleidoscopic construction to generate Schwarz triangles that tile the sphere. We use spherical trilinear coordinates within each Schwarz triangle to determine polyhedron vertices (with the trilinear coordinates being converted to barycentric for actually generating the points). Vertices for snub polyhedra are found by iterative approximation.

We also can use Schwarz triangles to apply other symmetry operations to the basic polyhedra to generate compound polyhedra, including all of the uniform compounds enumerated by John Skilling (as well as many others).

There are some other features including construction of dual figures, final stellations, inversions, subdividing polyhedra faces using a Sierpinksi construction, as well as various colouring effects, exploding faces etc..

Much use was made of the work of others, notably George Hart and Zvi Har’El as well as the wonderful Three.js library by Mr.doob.

# Javascript Quines

**Posted:**October 25, 2014

**Filed under:**Javascript 1 Comment

A Quine is a program that when run, prints itself. For example, in Javascript we can write:

`> (function $(){console.log('('+$+')()');})()`

*(function $(){console.log('('+$+')()');})()*

This is nice, but we can see that it depends on the fact that in Javascript a function automatically converts to a string that is its own source code; also, it would be nice to get rid of the explicit function binding.

Using an idea from Lisp (and this is surely inspired by the (λx.xx)(λ.xx) of the lambda calculus), we can use an anonymous function:

`> var f = function(s) { console.log("("+s+")"+"("+s+")");}`

> f(f)

*(function (s) { console.log("("+s+")"+"("+s+")");})(function (s) { console.log("("+s+")"+"("+s+")");})*

> (function (s) { console.log("("+s+")"+"("+s+")");})(function (s) { console.log("("+s+")"+"("+s+")");})

*(function (s) { console.log("("+s+")"+"("+s+")");})(function (s) { console.log("("+s+")"+"("+s+")");})*

This works nicely, with no function binding needed (we are just using the definition of f to get going here), but we are still using implicit conversion from functions to strings. Let’s try explicitly quoting the second occurrence of s:

`> var f = function(s){console.log("("+s+")"+"("+"\""+s+"\""+")");}`

> f(f)

*(function (s){console.log("("+s+")"+"("+"\""+s+"\""+")");})("function (s){console.log("("+s+")"+"("+"\""+s+"\""+")");}")*

That isn’t well-formed Javascript though, the single quotes in the stringified version of the function haven’t been escaped, so this won’t compile. We need to somehow insert the quotes into the string, but without getting into an endless regression with extra layer of quotes (this problem really only exists because opening and closing quotes are the same, if quoting nested in the same way that bracketing does, it would all be much easier).

(Note also that while we are using implicit function to string conversion to construct our Quine, the Quine itself doesn’t use that feature).

One simple solution is to insert the quotes as character codes:

`> var f = function(s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}`

> f(f)

*(function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");})('function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}')*

> (function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");})('function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}')

*(function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");})('function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}')*

but that seems rather inelegant and while EBCDIC computers are rare these days, it would be good to be character set independent.

We can also use a library function to handle quoting:

`> f = function(s){console.log("("+s+")"+"("+JSON.stringify(s)+")");}`

> f(String(f))

*(function (s){console.log("("+s+")"+"("+JSON.stringify(s)+")");})("function (s){console.log(\"(\"+s+\")\"+\"(\"+JSON.stringify(s)+\")\");}")*

> (function (s){console.log("("+s+")"+"("+JSON.stringify(s)+")");})("function (s){console.log(\"(\"+s+\")\"+\"(\"+JSON.stringify(s)+\")\");}")

*(function (s){console.log("("+s+")"+"("+JSON.stringify(s)+")");})("function (s){console.log(\"(\"+s+\")\"+\"(\"+JSON.stringify(s)+\")\");}")*

but we might object to such a heavyweight solution.

Another way forward is suggested by this C Quine:

`main(){char q='"';char*f="main(){char q='%c';char*f=%c%s%c;printf(f,q,q,f,q);}";printf(f,q,q,f,q);}`

where we avoid quoting in the quoted body by passing in the required characters from outside.

Here’s something similar in Javascript:

`> f = function(s,q,b,k){console.log(b+s+k+b+q+s+q+k);}`

> f(f,"\"","(",")")

*(function (s,q,b,k){console.log(b+s+k+b+q+s+q+k);})("function (s,q,b,k){console.log(b+s+k+b+q+s+q+k);}")*

Now there is no quotation at all in the function body, but this doesn’t quite work as we need to pass in the character parameters in the main function call, and for this we need to pass in the comma and escape characters as well:

`> f = function(s,q,b,k,c,e) { console.log(b+s+k+b+q+s+q+k+c+q+e+q+q+c+q+b+q+c+q+k+q+k); }`

> f(f,"\"","(",")",",","\\")

*(function (s,q,b,k,c,e) { console.log(b+s+k+b+q+s+q+k+c+q+e+q+q+c+q+b+q+c+q+k+q+k); })("function (s,q,b,k,c,e) { console.log(b+s+k+b+q+s+q+k+c+q+e+q+q+c+q+b+q+c+q+k+q+k); }"),"\"","(",")")*

Almost there; we are, again, just missing some parameters in the main call, but this time we can close the loop completely:

`> f = function(s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}`

> f(f,'"',"(",")",",","\\");

*(function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);})("function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}","\"","(",")",",","\\")*

> (function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);})("function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}","\"","(",")",",","\\")

*(function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);})("function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}","\"","(",")",",","\\")*

Not minimal in terms of length, but fairly minimal in terms of features required – no library functions, no fancy string formatting directives, no name binding apart from function application, no language specific tricks, no character set dependencies; we haven’t even made use of there being two ways to quote strings in Javascript.

Since we aren’t being very specific to Javascript, it’s easy to adapt the solution to other languages:

Haskell:

`(\s q b k e->putStrLn(b++s++k++q++e++s++q++q++e++q++q++q++b++q++q++k++q++q++e++e++q))"\\s q b k e->putStrLn(b++s++k++q++e++s++q++q++e++q++q++q++b++q++q++k++q++q++e++e++q)""\"""("")""\\"`

Python 3 (this doesn’t work in Python 2 as print can’t be used in lambda functions there):

`(lambda s,q,b,k,c,e:print(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k))("lambda s,q,b,k,c,e:print(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k)","\"","(",")",",","\\")`

Again, there are shorter Quines in both languages, but the ones I have seen all require some extra feature, eg. escaping quotes using “show” in Haskell or “%r” in Python.

Finally, since Javascript has an eval function, we can construct a string that evaluates to itself:

`> f = function(s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;}`

> f(f,"\"","(",")",",","\\")

*'(function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;})("function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;}","\\"","(",")",",","\\\\")'*

> eval(_)

*'(function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;})("function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;}","\\"","(",")",",","\\\\")'*

# Monad Tutorial

**Posted:**September 22, 2014

**Filed under:**Lambda Calculus Leave a comment

Every programming blog should have a monad tutorial, so here we go: we can define a simple denotational semantics for a typed functional language as:

ℰ ⟦v:A⟧_{ρ}= ρ(v):A ℰ ⟦λx:A.e⟧_{ρ}= λa:A.ℰ ⟦e⟧_{ρ[x→a]}: A → B ℰ ⟦e_{1}e_{2}⟧_{ρ}= let f: A → B = ℰ ⟦e_{1}⟧_{ρ}a: A = ℰ ⟦e_{2}⟧_{ρ}in f a: B

Here ρ is an environment, mapping variables to values, and note that in the rule for a λ expression, the λ in the right hand side is defining a function in the domain of values, whereas the left hand side λ is just a linguistic construct. We could decorate every expression with a type, but that would get untidy. There will be other rules for specific operations on whatever actual datatypes are around, but this gives the underlying functional basis on which everything else depends.

We can see that ℰ⟦e⟧_{ρ} is just a value in some semantic domain, which contains, presumably, some basic types and functions between values and the type of ℰ is something like:

ℰ: Exp[A] → Env → A

where Exp[A] is set of expressions of type A (I’m not going to be rigorous about any of this, I’m assuming we have some type system where this sort of thing makes sense, and also I’m not going to worry about the difference between a syntactic type and a semantic type) and Env is the type of environments.

Just for fun, let’s make a distinction (not that there really is one here) between “ordinary” values and “semantic” values, with M[A] being the semantic value with underlying value type A (imagine an ML or Haskell style type constructor M, with a value constructor, also called M, though often we’ll ignore the distinction between the underlying type and the constructed type).

Now ℰ has type:

ℰ: Exp[A] → Env → M[A]

and the underlying value of a function of type A → B is now A → M[B].

We can also rewrite our semantic equations and take a little time persuading ourselves this is equivalent to the definitions above:

ℰ ⟦v:A⟧_{ρ}= inj(ρ(v)): M(A) ℰ ⟦λx:A.e⟧_{ρ}= inj(λa:A.ℰ ⟦e⟧_{ρ[x→a]}): M(A → M[B]) ℰ ⟦e_{1}e_{2}⟧_{ρ}= let a_{1}: M[A → M[B]] = ℰ ⟦e_{1}⟧_{ρ}a_{2}: M[A] = ℰ ⟦e_{2}⟧_{ρ}in apply (λf.apply f a_{2}) a_{1}: M[B]

inj and apply are:

inj: A → M[A] inj(a:A) = M(a) : M[A] apply: (A → M[B]) → M[A] → M[B] apply f (M a) = f a

These functions should look familiar; they are the standard monad operations & using a different monad will give us a different semantics for our basic functional operations.

Let’s introduce state, the basic denotational semantics is something like:

ℰ: Exp[A] → Env → State → (A, State) ℰ ⟦v:A⟧_{ρ σ}= (ρ(v),σ) ℰ ⟦λx.e⟧_{ρ σ}= (λa.ℰ ⟦e⟧_{ρ[x→a]}, σ) ℰ ⟦e_{1}e_{2}⟧_{ρ σ}= let (f,σ') = ℰ ⟦e_{1}⟧_{ρ σ}(a, σ'') = ℰ ⟦e_{2}⟧_{ρ σ'}in f a σ''

(I’ve omitted type decorations here for clarity).

Let’s do the same trick with a special semantic domain (though this time we’ll leave the type constructors implicit) and we have:

M[A] = State→(A, State) inj a σ = (a,σ) apply f g σ = let (a,σ')a = g σ in f a σ'

and we can see that we can just plug these definitions into our generic semantic equations above and get something equivalent to the specific state semantics.

So, a monad is just a typed semantic domain together with the operations necessary to specify the standard functional constructions over that domain. Which sort of makes sense, but it’s nice to see it just drop out of the equations (and of course it’s nice to see that a standard denotational semantics for something like state does actually correspond quite closely with the monadic semantics).

None of this is new, in fact the use of monads to provide a uniform framework for program semantics goes right back to Eugenio Moggi’s original work in the 1980s (which was then taken up in functional programming where elements of the semantic domain itself are modelled as normal data objects).

# At The Bakery

**Posted:**June 10, 2014

**Filed under:**C 1 Comment

After the topical excitement of the last couple of posts, let’s look at an all-time great – Leslie Lamport’s Bakery Algorithm (and of course this is still topical; Lamport is the most recent winner of the Turing Award).

The problem is mutual exclusion without mutual exclusion primitives. Usually, it’s described in the context of a shared memory system (and that is what we will implement here), but will work equally well in a message-passing system with only local state (each thread or process only needs to write to its own part of the store).

For further details, and Lamport’s later thoughts see http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#bakery: “For a couple of years after my discovery of the bakery algorithm, everything I learned about concurrency came from studying it.” – and since Lamport understands more about concurrency than just about anyone on the planet, it’s maybe worth spending some time looking at it ourselves.

I’m not going to attempt to prove the algorithm correct, I’ll leave that to Lamport, but the crucial idea seems to me to be that a thread reading a particular value from another thread is a synchronization signal from that thread – here, reading a false value for the `entering`

variable is a signal that the other thread isn’t in the process of deciding on it’s own number, therefore it is safe for the reading process to proceed.

Implementing on a real multiprocessor system, we find that use of memory barriers or synchronization primitives is essential – the algorithm requires that reads and writes are serialized in the sense that once a value is written, other processes won’t see an earlier value (or earlier values of other variables). This doesn’t conflict with what Lamport says about not requiring low-level atomicity – we can allow reads and writes to happen simultaneously, with the possibility of a read returning a bogus value – and in fact we can simulate this in the program by writing a random value just before a process selects its real ticket number, but once a write has completed, all processes should see the new value.

Another essential feature is the volatile flag – as many have pointed out, this isn’t enough by itself for correct thread synchronization, but for shared memory systems, prevents the compiler from making invalid assumptions about consistency of reads from shared variables.

A final point – correctness requires that ticket numbers can increase without bound, this is hard to arrange in practice, so we just `assert`

if they grow too large (this rarely happens in reality, unless we get carried away with our randomization).

#include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <assert.h> // Compile with: g++ -Wall -O3 bakery.cpp -pthread -o bakery static const int NTHREADS = 4; // Some features to play with //#define NCHECK // Disable crucial check //#define NSYNC // Disable memory barrier //#define NVOLATILE // Disable volatile #if defined NVOLATILE #define VOLATILE #else #define VOLATILE volatile #endif VOLATILE bool entering[NTHREADS]; VOLATILE unsigned number[NTHREADS]; VOLATILE int count = 0; VOLATILE int total = 0; unsigned getmax(int n) { unsigned max = 0; for (int i = 0; i < n; i++) { if (number[i] > max) max = number[i]; } return max; } bool check(int i, int j) { return number[j] < number[i] || (number[j] == number[i] && j < i); } inline void synchronize() { #if !defined NSYNC // gcc builtin full memory barrier __sync_synchronize(); #endif } void lock(int i) { entering[i] = true; synchronize(); // Simulate non-atomic write number[i] = rand(); synchronize(); number[i] = 1 + getmax(NTHREADS); assert(number[i] > 0); entering[i] = false; synchronize(); for (int j = 0; j < NTHREADS; j++) { // Wait until thread j receives its number: #if !defined NCHECK while (entering[j]) { /* nothing */ } #endif // At this point, we have read a false value for // "entering[j]", therefore any number picked by j // later will takes our choice into account, any value // chosen earlier (and so might be less than ours) // will be visible to us in the following test. // Wait until all threads with smaller numbers or with // the same number, but with higher priority, finish // their work: while ((number[j] != 0) && check(i,j)) { /* nothing */ } } } void unlock(int i) { number[i] = 0; } void *threadfun(void *arg) { int i = *(int*)(arg); while (true) { lock(i); total++; if (total % 1000000 == 0) fprintf(stderr,"%c", 'a'+i); assert(count==0); // Check we have exclusive access count++; // It's not clear that these synchs are unnecessary, // but nothing seems to break if I remove them. //synchronize(); count--; //synchronize(); unlock(i); // non-critical section... } return NULL; } int main() { pthread_t t[NTHREADS]; int n[NTHREADS]; for (int i = 0; i < NTHREADS; i++) { n[i] = i; pthread_create(&t[i], NULL, threadfun, (void*)&n[i]); } for (int i = 0; i < NTHREADS; i++) { pthread_join(t[i], NULL); } }

# How to use SRP in OpenSSL

**Posted:**May 10, 2014

**Filed under:**C++, Crypto 6 Comments

[Update 10/8/14: As Vakharia points out in the comments, there have been a couple of DoS-type problems found with the OpenSSL SRP code, fixed in OpenSSL 1.0.1i. There seems to be a problem with that version with uncertificated SRP connections, see: http://marc.info/?t=140745609300002&r=1&w=2, so you might need to patch for that to work]

SRP seems to be a much neglected protocol & there have even been noises about removing TLS-SRP from some versions of OpenSSL. This is a shame as SRP has many nice properties that are only more attractive after the Heartbleed scare, including forward secrecy and mutual authentication without PKI.

There are various reasons for that neglect including possible concerns over patents, though it’s not very clear what the issue here is, or even if there is one. One issue with using SRP in OpenSSL in particular is that the C API isn’t very well documented, so this is an attempt to improve that situation. This is what I have been able to glean from experimentation, reading various bits of source as well as miscellaneous bits and pieces from the web. It all works for me & seems sensible, but I’m not a crypto expert so caveat emptor applies even more than usual.

For further details on SRP and TLS-SRP see:

http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol

http://en.wikipedia.org/wiki/TLS-SRP

The relevant RFCs are:

http://tools.ietf.org/html/rfc2945

http://tools.ietf.org/html/rfc5054

Also some useful discussions at:

http://bert-hubert.blogspot.co.uk/2012/02/on-srp-some-implementation-notes-and.html

http://crypto.stackexchange.com/questions/8245/why-is-srp-not-widely-used

On to the implementation details. All code fragments taken from full program at https://github.com/matthewarcus/ssl-demo.

We will start with the server side. The crucial piece of server data for SRP is the password verifier. We can make a verifier file using the openssl srp command line tool:

$ openssl srp Exactly one of the options -add, -delete, -modify -list must be specified. usage: srp [args] [user] -verbose Talk alot while doing things -config file A config file -name arg The particular srp definition to use -srpvfile arg The srp verifier file name -add add an user and srp verifier -modify modify the srp verifier of an existing user -delete delete user from verifier file -list list user -gn arg g and N values to be used for new verifier -userinfo arg additional info to be set for user -passin arg input file pass phrase source -passout arg output file pass phrase source -engine e - use engine e, possibly a hardware device. -rand file:file:... load the file (or the files in the directory) into the random number generator

The -gn parameter requires explanation: the security of SRP (like Diffie-Hellman key exchange) is based on the hardness of the discrete logarithm problem, ie. given a large prime N and a co-prime generator g, given g^a mod N, it’s hard to determine a. g and N are fixed in advance and don’t have to be secret so it’s normal to use standard values that do not allow any of the known shortcuts for discrete logarithms – as defined for example in the appendix to RFC 5054 for particular bit sizes (1024 and 1536 are mandatory for TLS-SRP) & it’s the bit size that is given as the -gn argument:

$ touch password.srpv $ openssl srp -srpvfile password.srpv -add -gn 1536 user Enter pass phrase for user: Verifying - Enter pass phrase for user: $ cat password.srpv V 2qYg0YhL6s9OsjpPt4eD4iIDB/SF.7pEGPLHVIsLvVD9wUU5tqngvzQUA7Uf6nAQtP.K U4G.9yra1Ia4fkOrUbx2QjGVizyc.QcaCr83nIewI/ry57Vrgg6QQv2U6Z7ClC0Wig5yKH BDu2Lfny1aEZy3i7oi3dTywvIxDeFCcS0UPIhUgpRnINZ5K2HJiz6TuofvIfYC2EMpD5Q8 PuZ8/fB62TvfFK7TN67cOCCJSroOukrrr/KScmoDZ/odfKUM FRzonhUh8ApuhBf45xMzsX1Olm1 user 1536

Note that the verifier file has to exist beforehand, even if empty.

Now we have a verifier file, we can load it in our server code. OpenSSL defines a handy SRP_VBASE type that can be used to store verifiers and we can use SRP_VBASE_init to load in the the verifier file we made earlier:

#include <openssl/srp.h> static SRP_VBASE *srpData = NULL; static const char *srpvfile = "password.srpv"; void setupSRPData(SSL_CTX *ctx) { srpData = SRP_VBASE_new(NULL); srpData = SRP_VBASE_new(NULL); CHECK(srpData != NULL); if (SRP_VBASE_init(srpData, (char *)srpvfile) != 0) { // File failed to load... } }

We can also make a verifier directly and store it ourselves in an SRP_VBASE structure (or elsewhere):

static const char *srpgroup = "1536"; static const char *username = "user"; static const char *password = "password"; void setupSRPData(SSL_CTX *ctx) { ... // The structure to put the verifier data SRP_user_pwd *p = (SRP_user_pwd *)OPENSSL_malloc(sizeof(SRP_user_pwd)); SRP_gN *gN = SRP_get_default_gN(srpgroup); CHECK(gN != NULL); // This check seems a bit pointless, but doesn't do harm. char *srpCheck = SRP_check_known_gN_param(gN->g, gN->N); CHECK(srpCheck != NULL); // Now create the verifier for the password. // We could get the password from the user at this point. BIGNUM *salt = NULL, *verifier = NULL; CHECK(SRP_create_verifier_BN(username, password, &salt, &verifier, gN->N, gN->g)); // Copy into the SRP_user_pwd structure p->id = OPENSSL_strdup(username); p->g = gN->g; p->N = gN->N; p->s = salt; p->v = verifier; p->info = NULL; // And add in to VBASE stack of user data sk_SRP_user_pwd_push(srpData->users_pwd, p); }

Once we are done with the srpData structure, we should free it:

if (srpData != NULL) SRP_VBASE_free(srpData);

(I’ve checked this code with valgrind and all seems well – when compiled with -DPURIFY, OpenSSL is quite well-behaved under valgrind, though there always seems to be a few hundred bytes of still-reachable data left at the end).

Now we have our verifier data loaded, we can define a suitable callback function:

int srpServerCallback(SSL *s, int *ad, void *arg) { (void)arg; char *srpusername = SSL_get_srp_username(s); CHECK(srpusername != NULL); // Get data for user SRP_user_pwd *p = SRP_VBASE_get_by_user(srpData,srpusername); if (p == NULL) { fprintf(stderr, "User %s doesn't exist\n", srpusername); return SSL3_AL_FATAL; } // Set verifier data CHECK(SSL_set_srp_server_param(s, p->N, p->g, p->s, p->v, NULL) == SSL_OK); return SSL_ERROR_NONE; }

And tell OpenSSL to use it for SRP connections:

SSL_CTX_set_srp_username_callback(ctx, srpServerCallback); ...

For a simple demo, using a global variable for the srpData is adequate, we could pass it in as the callback argument (or another context object):

CHECK(SSL_CTX_set_srp_cb_arg(ctx, srpData) == SSL_OK); ...

You’ll have to read the code to find out what the ad parameter is for.

Note that for a non-existent user, we return a fatal error, which result in the connection terminating with a PSK identity not known alert. More secure behaviour (and suggested in the RFC) is probably to simulate authentication with a dummy user, with failure happening in the same way as if the password was wrong (there are some extra fields in the SRP_VBASE structure for helping with this but I haven’t tried to do that yet).

In fact, in the full program, the first time the SRP callback function is called, we don’t have the srpData loaded, and we return -1 from the callback to indicate this:

int srpServerCallback(SSL *s, int *ad, void *arg) { ... // Simulate asynchronous loading of SRP data if (srpData == NULL) { doSRPData = true; return -1; // Not ready yet } ... }

When the callback fails in this way, the handshake returns a WANT_X509_LOOKUP error and we handle that by loading the srpData at this point (the idea presumably is that we may want to load the SRP data asynchronously & we can do that before called SSL_accept again):

int res = sslAccept(ssl); if (res == SSL_OK) { break; } else if (SSL_get_error(ssl,res) == SSL_ERROR_WANT_X509_LOOKUP && doSRPData) { setupSRPData(ctx); doSRPData = false; ...

A couple of small details: we don’t want to do client certificate verification for SRP connections, so turn it off:

bool isSRP = SSL_get_srp_username(ssl) != NULL; bool verify = doVerify && !isSRP && !peerVerified(ssl); ...

Testing the SRP username seemed to be the neatest way of finding out if the connection is actually using SRP (there doesn’t seem to be a standard interface for this).

Also the non-certificate verified SRP ciphersuites aren’t included in the default list of ciphers (since they officially offer no authentication [Update 10/8/15: this should be fixed in OpenSSL 1.0.1i]):

$ openssl ciphers -v | grep SRP SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=AES(256) Mac=SHA1 SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(256) Mac=SHA1 SRP-DSS-3DES-EDE-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=3DES(168) Mac=SHA1 SRP-RSA-3DES-EDE-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=3DES(168) Mac=SHA1 SRP-DSS-AES-128-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=AES(128) Mac=SHA1 SRP-RSA-AES-128-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(128) Mac=SHA1 $ openssl ciphers -v ALL | grep SRP SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=AES(256) Mac=SHA1 SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(256) Mac=SHA1 SRP-AES-256-CBC-SHA SSLv3 Kx=SRP Au=None Enc=AES(256) Mac=SHA1 SRP-DSS-3DES-EDE-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=3DES(168) Mac=SHA1 SRP-RSA-3DES-EDE-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=3DES(168) Mac=SHA1 SRP-3DES-EDE-CBC-SHA SSLv3 Kx=SRP Au=None Enc=3DES(168) Mac=SHA1 SRP-DSS-AES-128-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=AES(128) Mac=SHA1 SRP-RSA-AES-128-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(128) Mac=SHA1 SRP-AES-128-CBC-SHA SSLv3 Kx=SRP Au=None Enc=AES(128) Mac=SHA1

so we will set available ciphers appropriately (NULL is also useful for testing, I don’t recommend it for actual use);

const char *cipherlist = "ALL:NULL"; ... CHECK(SSL_CTX_set_cipher_list(ctx,cipherlist) == SSL_OK); ...

That’s about it for the server, now on to the client, which fortunately is a good deal simpler.

Once again, we need to define a callback function:

const char *password = NULL; char *srpCallback(SSL *ssl, void *arg) { char *user = (char*)arg; if (password != NULL) { // Can be passed in on command line return OPENSSL_strdup(password); } else { ssize_t promptsize = 256; char prompt[promptsize]; CHECK(snprintf(prompt, promptsize, "Password for %s: ", user) < promptsize); char *pass = getpass(prompt); char *result = OPENSSL_strdup(pass); // getpass uses a static buffer, so clear it out after use. memset(pass,0,strlen(pass)); return result; } }

`getpass`

is officially obsolete, but does the job here (and we will make a concession to real security and null out the password after it’s been returned – hopefully OpenSSL doesn’t hold on the duplicated password any longer than needed).

Now finish the set up:

if (doSRP) { CHECK(SSL_CTX_set_srp_username(ctx, (char*)username)); SSL_CTX_set_srp_cb_arg(ctx,(void*)username); SSL_CTX_set_srp_client_pwd_callback(ctx, srpCallback); ...

We need to set the username before the handshake as it’s included in the initial hello message. The password is only required if an SRP ciphersuite is actually negotiated.

Finally, we need to get the right ciphersuites. If we include SRP ciphers in the client hello, but no user name, we will get a fatal alert if the server wishes to use SRP (which it may well do if it hasn’t been configured with ECDHE):

$ openssl ciphers -v ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH Au=RSA Enc=AESGCM(256) Mac=AEAD ECDHE-ECDSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH Au=ECDSA Enc=AESGCM(256) Mac=AEAD ECDHE-RSA-AES256-SHA384 TLSv1.2 Kx=ECDH Au=RSA Enc=AES(256) Mac=SHA384 ECDHE-ECDSA-AES256-SHA384 TLSv1.2 Kx=ECDH Au=ECDSA Enc=AES(256) Mac=SHA384 ECDHE-RSA-AES256-SHA SSLv3 Kx=ECDH Au=RSA Enc=AES(256) Mac=SHA1 ECDHE-ECDSA-AES256-SHA SSLv3 Kx=ECDH Au=ECDSA Enc=AES(256) Mac=SHA1 SRP-DSS-AES-256-CBC-SHA SSLv3 Kx=SRP Au=DSS Enc=AES(256) Mac=SHA1 SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(256) Mac=SHA1 ...

so it’s best to not request SRP unless we really want it:

if (doSRP) { cipherlist = "SRP"; } else { cipherlist = "DEFAULT:!SRP"; }

Let’s see all this in action:

On the server side:

$ ./ssl_server -v --srp 5999 renegotiation: allowed TLSv1.2: SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(256) Mac=SHA1 Session ID: (null) Session ID CTX: 49:4E:49:54 No peer certificates. ...

Normal client connection:

$ ./ssl_client -v --srp localhost:5999 Cipher suites: SRP-DSS-AES-256-CBC-SHA SRP-RSA-AES-256-CBC-SHA SRP-AES-256-CBC-SHA SRP-DSS-3DES-EDE-CBC-SHA SRP-RSA-3DES-EDE-CBC-SHA SRP-3DES-EDE-CBC-SHA SRP-DSS-AES-128-CBC-SHA SRP-RSA-AES-128-CBC-SHA SRP-AES-128-CBC-SHA Password for user: TLSv1.2: SRP-RSA-AES-256-CBC-SHA SSLv3 Kx=SRP Au=RSA Enc=AES(256) Mac=SHA1 Session ID: 19:E6:71:49:6A:3B:B4:0F:AE:AD:66:86:0A:87:55:37:5C:6B:DC:51:D9:89:12:CF:45:5E:A5:12:D8:91:42:CC Session ID CTX: (null) Peer certificates: 0: Subject: C = GB, O = TEST SERVER Issuer: C = GB, O = TEST CA 1: Subject: C = GB, O = TEST CA Issuer: C = GB, O = TEST ROOT Certificate OK ...

Invalid user:

$ ./ssl_client -v --srp --user invalid localhost:5999 Cipher suites: SRP-DSS-AES-256-CBC-SHA SRP-RSA-AES-256-CBC-SHA SRP-AES-256-CBC-SHA SRP-DSS-3DES-EDE-CBC-SHA SRP-RSA-3DES-EDE-CBC-SHA SRP-3DES-EDE-CBC-SHA SRP-DSS-AES-128-CBC-SHA SRP-RSA-AES-128-CBC-SHA SRP-AES-128-CBC-SHA SSL3 alert read: fatal: unknown PSK identity 'sslConnect(ssl) == SSL_OK' failed: ssl_client.cpp:288 140613521352384:error:1407745B:SSL routines:SSL23_GET_SERVER_HELLO:reason(1115):s23_clnt.c:779:

Finally, as mentioned above, the standard SRP ciphersuites also do certificate-based server authentication – this seems sensible, if the verifier hasn’t been compromised, then SRP guarantees the authenticity of the server, but the incentives for the server to protect its private key are much greater than for it to protect the verifier for a particular user. In the case that we trust the server not to leak the verifier (perhaps we manage it ourselves and are accessing it remotely), we can use the non-PKI SRP ciphersuites by explicitly requesting them from the client:

$ ./ssl_client -v --srp --user user --cipherlist SRP-AES-256-CBC-SHA localhost:5999 Cipher suites: SRP-AES-256-CBC-SHA Password for user: TLSv1.2: SRP-AES-256-CBC-SHA SSLv3 Kx=SRP Au=None Enc=AES(256) Mac=SHA1 Session ID: E9:F3:09:F9:7E:F8:43:60:0D:43:43:93:11:63:AE:F2:51:9F:4C:A9:4F:19:F8:89:DF:4F:07:02:66:42:F2:E6 Session ID CTX: (null) No peer certificates. ...

# How to Leak a Key

**Posted:**April 16, 2014

**Filed under:**Crypto 1 Comment

It’s been been interesting following the great Heartbleed crisis over the last week. The “Heartbleed Challenge” set up by Cloudflare established that you could get the server private key, specifically, the primes used to generate the key would show up in the Heartbleed data occasionally. Doing a memory dump of a simple OpenSSL server application (https://github.com/matthewarcus/ssl-demo) showed that after reading the keys at startup, it later copies the prime data to higher locations in the heap where they become more likely to be included in a Heartbeat response.

An extract from a hex dump of the heap after making a single request shows:

000273f0 00 00 00 00 01 00 00 00 51 00 00 00 00 00 00 00 # Prime1 00027400 85 52 42 a0 71 54 76 bd 98 49 4f 25 fc d3 e1 89 ... 00027460 00 00 00 00 01 00 00 00 51 00 00 00 00 00 00 00 # Prime2 00027470 3b c8 8b 7d 52 74 db e0 af 9e 13 d7 a2 0a fc ea ... 0002d020 60 55 00 00 00 00 00 00 50 45 00 00 00 00 00 00 # Input buffer 0002d030 16 03 01 16 03 03 00 28 e8 d6 b3 68 87 08 d4 7f ... 00031570 00 00 00 00 00 00 00 00 c1 44 00 00 00 00 00 00 # Output buffer 00031580 00 00 00 16 03 03 00 28 82 c9 a8 9a 87 b5 cf e7 ... 00038990 90 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 # Prime 2 again 000389a0 3b c8 8b 7d 52 74 db e0 af 9e 13 d7 a2 0a fc ea ... 00038b50 30 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 # Prime 1 again 00038b60 85 52 42 a0 71 54 76 bd 98 49 4f 25 fc d3 e1 89

The first line in each section is the malloc header, the second line is part of the data; also, all of these malloced blocks are in use (otherwise some of the data would be overwritten with free list data).

The first two primes are from the initial processing of the private key, the next two items are the input and output buffers for the connection, then we have two more copies of the primes, which only appear after a connection has been made.

So where are these persistent copies of the primes coming from? Some investigation shows that the first time OpenSSL actually uses an RSA key for a private operation, it calculates various auxiliary values for use in Montgomery arithmetic and stores them away in the key structure. Unfortunately, one of the values that gets stored away is the modulus used, which in this case is one of the primes:

rsa_eay.c:

static int RSA_eay_mod_exp(BIGNUM *r0, const BIGNUM *I, RSA *rsa, BN_CTX *ctx) { ... if (rsa->flags & RSA_FLAG_CACHE_PRIVATE) { if (!BN_MONT_CTX_set_locked(&rsa->_method_mod_p, CRYPTO_LOCK_RSA, p, ctx)) goto err; if (!BN_MONT_CTX_set_locked(&rsa->_method_mod_q, CRYPTO_LOCK_RSA, q, ctx)) goto err; ... } }

bn_mont.c:

BN_MONT_CTX *BN_MONT_CTX_set_locked(BN_MONT_CTX **pmont, int lock, const BIGNUM *mod, BN_CTX *ctx) { ... if (!*pmont) { ret = BN_MONT_CTX_new(); if (ret && !BN_MONT_CTX_set(ret, mod, ctx)) ... } } ... int BN_MONT_CTX_set(BN_MONT_CTX *mont, const BIGNUM *mod, BN_CTX *ctx) { int ret = 0; BIGNUM *Ri,*R; BN_CTX_start(ctx); if((Ri = BN_CTX_get(ctx)) == NULL) goto err; R= &(mont->RR); /* grab RR as a temp */ if (!BN_copy(&(mont->N),mod)) goto err; /* Set N */ ... }

That BN_copy in the last line is the culprit. Once allocated, these values stay in memory until the RSA key is deleted. So even if the original key data is stored in protected or hard to get at memory, a Heartbleed attack may still be able to get to the primes.

This also explains why, as far as I know, no one has managed to capture the CRT parameters or the private exponent itself – they stay safely tucked away in the low part of the heap. Also, depending on the application startup behaviour, it might be that the primes end up below the input buffers (by default, OpenSSL caches up to 32 previously allocated buffers of each type) in which case they won’t be visible to Heartbleed – this might explain why in the Cloudflare challenge, only one of the primes seemed to be turning up.

Update 20/04/14: actually, it’s worse than this – bignum resizing can also leak private data onto the heap in unpredictable ways, more details to follow.

# Loopless Ruler

**Posted:**April 13, 2014

**Filed under:**C Leave a comment

This function (based on Algorithm L for loopless Gray code generation in TAOCP 7.2.1.1):

void ruler(int n) { int f[n+1]; for (int j = 0; j <= n; j++) { f[j] = j; } while(true) { visit(f,n+1); int j = f[0]; f[0] = 0; if (j == n) break; f[j] = f[j+1]; f[j+1] = j+1; } }

enables us to compute the “ruler function”, ρ for 1, 2, 3…:

0 1 0 2 0 1 0 3 0 1 0 2 0 1 0...

It’s a “loopless” algorithm and uses f, an array of “focus pointers” to keep track of what’s going on.

The ruler function ρ(n) is essentially the number of trailing 0’s in the binary representation of n.

This is what’s printed by ruler(4):

0 1 2 3 4 1 1 2 3 4 0 2 2 3 4 2 1 2 3 4 0 1 3 3 4 1 1 3 3 4 0 3 2 3 4 3 1 2 3 4 0 1 2 4 4 1 1 2 4 4 0 2 2 4 4 2 1 2 4 4 0 1 4 3 4 1 1 4 3 4 0 4 2 3 4 4 1 2 3 4

The left hand side is the ruler function of course, what are the other values?

It’s clearer if we print f in reverse and match up with binary representations:

00000 43210 00001 43211 00010 43220 00011 43212 00100 43310 00101 43311 00110 43230 00111 43213 01000 44210 01001 44211 01010 44220 01011 44212 01100 43410 01101 43411 01110 43240 01111 43214

If n[j] is the jth binary digit of n, then f[j] is just j, unless n[j] is the first of a block of 1’s, in which case f[j] is the index of the first 0 after that block.

For example, 01101, has block starting at 0 and ending at 1, so f[0] = 1, and another block starting at 2, ending at 4, so f[2] = 4; for other j, f[j] = j, so the focus pointers are 43411.

If n is of the form:

j k 01110111

with f[0] = k, f[k] = k, f[k+1] = j

0n incrementing we get:

j k 01111000

Now f[0] = 0, f[k] = j and f[k+1] = k+1, other values of f are the same

If k is 0, then we go from:

j k 01110

with f[0] = 0, f[1] = j

to:

j k 01111

with f[0] = 0, f[1] = 1

Either way, we can see that the inner loop of `ruler`

is correct.