# Quaternions and Reflections

**Posted:**February 15, 2019

**Filed under:**Uncategorized Leave a comment

It’s well known that quaternions can be used to represent rotations in 3- and 4-dimensional space. It seems less well known that they can also be used to represent reflections and, in fact, we can derive the representation of rotations from this in a straightforward way (since all rotations can be constructed as a sequence of rotations).

We shall write quaternions `p`

and `q`

as:

`p = x+X`

`q = y+Y`

where `x`

and `y`

are scalars and `X`

and `Y`

are 3-vectors. Now quaternion multiplication can be defined as:

`pq = (x+X)(y+Y) = (xy - X.Y) + (xY + yX + XxY)`

with brackets in the result indicating the scalar and vector parts (and where `X.Y`

and `XxY`

are the usual dot and cross product). Quaternion multiplication is associative: `p(qr) = (pq)r = pqr`

but not commutative (in fact, `pq = qp`

just when `XxY`

is zero).

We also define conjugation, the conjugate of `p`

, which we write as `p'`

(overbar `p̅`

is conventional, but I think is harder to read onscreen), is `p`

with the vector part negated:

`p' = x-X`

Combining multiplication and conjugation we have:

`pq' = (x+X)(y-Y) = (xy + X.Y) + (-xY + yX - XxY)`

`qp' = (y+Y)(x-X) = (xy + X.Y) + (+xY - yX + XxY)`

(since `YxX = -XxY`

)

and adding both equations, the vector parts cancel and we have:

`pq' + qp' = 2(xy + X.Y)`

But `xy + X.Y`

is just the dot product of `p`

and `q`

, considered as normal 4-dimensional vectors, so we can write:

`pq' + qp' = 2p.q`

This nicely relates quaternion algebra directly to the geometry of 4-space and we can now rewrite expressions using dot product purely in terms of quaternion arithmetic (quaternion addition and subtraction and operations with scalars are the same as for ordinary vectors, of course).

So, the usual definition, valid in any dimension, for a reflection in a (hyper)plane with normal `n`

(assumed to be a unit vector) is:

`p → p - 2(p.n)n`

we can rewrite as:

`p → p - (pn' + np')n = p - pn'n - np'n = p - p - np'n = -np'n`

ie. reflection of `p`

in `n`

is just `-np'n`

. Note that a 4-dimensional reflection is *in* a 3-dimensional hyperplane (or perhaps it is clearer to think of the reflection as being *along* the normal vector), so vectors parallel to the normal are reversed, vectors orthogonal to the normal (ie. in the normal hyperplane) are unchanged.

Interestingly, the relation `pq' + qp' = 2p.q`

is also true for complex numbers (with the usual complex multiplication and conjugation) and we have exactly the same represention for a 2-dimensional reflection, but we can further simplify using the commutativity of complex multiplication:

`z → z - 2(z.n)n = -nz'n = -n²z'`

Returning to quaternions, we can combine two reflections, with normals n and m, say, to get a (simple) rotation:

`p → -m(-np'n)'m = mn'pn'm`

and geometry tells us that the rotation is about an angle twice that between the the planes of reflection.

In 4-space, this rotation can be thought of as *in* the plane spanned by `n`

and `m`

(ie. vectors in that plane rotate to other vectors in the plane), or *around* an axis of another plane, orthogonal to the first, in fact the intersection of the two hyperplanes normal to `n`

and `m`

.

A general 4-dimensional rotation needs 4 reflections:

`p → j(l(mn'pn'm)'l)'j = jl'mn'pn'ml'j`

and we lose any simple relation between the left and right quaternion – in fact `p → qpr`

is a rotation for any unit quaternions `q`

and `r`

(an interesting case arises if `q`

or `r`

is the identity – a Clifford translation).

We can derive expressions for reflections and rotations in 3-space from this, with 3-space points represented as pure quaternions, ie. with zero scalar part, so if `n`

and `m`

are pure, then `n' = -n`

and `nm = (mn)'`

A reflection now is:

`p → -np'n = npn`

and a rotation is:

`p → mn'pn'm = mnp(mn)'`

and we have the usual definition of a rotation in 3-space:

`p → qpq'`

Representing rotations as compositions of reflections also has practical uses. Consider the problem of finding a rotation to take a vector `p`

to a vector `q`

(assumed to both be of unit length) – we can do this as two reflections, first reflect `p`

in the (hyper)plane normal to `p+q`

, this aligns `p`

with `q`

, but in the opposite direction, so reflect again in the plane orthogonal to `q`

(it may help to sketch a picture – the construction makes sense in 2 or more dimensions). Using the 3-space formulation above, and the fact that `qq = -1`

for pure `q`

, we have:

`p → qpq'`

where `q = -q(p+q)/|p+q| = (-qp-qq)/|p+q| = (1-qp)/|p+q|`

and since we know that the result is of unit length, we can just use:

`p → normalize(1-qp)`

(with a special case when `qp = 1`

, ie when `q = -p`

, when the rotation is not uniquely defined).

Also, if we have two vectors `x0`

,`y0`

and wish to rotate them to new vectors `x1`

,`y1`

(assume `|x0| = |x1|, |y0| = |y1|`

, and `|x0-y0| = |x1-y1|`

, first reflect `x0`

to `x1`

along `x1-x0`

(ie. in the plane orthogonal to `x1-x0`

), this also reflects `y0`

to a new point `y2`

, but we can then reflect `y2`

to `y1`

along `y2-y1`

(this leaves `x1`

unchanged as it is equidistant from from `y1`

and `y2`

):

`n = x1-x0`

`y2 = y1 - 2(y1.n/n.n)n`

`m = y2-y1`

`r = normalize(mn)`

There is always a unique rotation, but two special cases are: if `x0 = x1`

, then swap `x0`

with `y0`

and `x1`

with `y1`

(if `y0 = y1`

as well, there is nothing to do); and if `y2 = y1`

, then reflect along `cross(x1,y1)`

rather than y2-y1.

A similar construction is also possible in 4-dimensional space: 3 point pairs define a general rotation (the first three reflections map the 3 points, the last reflection, in the hyperplane containing those points, ensures we have a proper rotation).

Much of this is taken from Coxeter’s 1946 paper, “Quaternions and Reflections”, https://www.jstor.org/stable/2304897, which of course goes into much more depth and with much greater rigour.

# Veth Devices, Network Namespaces and Open vSwitch

**Posted:**February 4, 2018

**Filed under:**Networking, Unix 1 Comment

It’s useful to be able to set up miniature networks on a Linux machine, for development, testing or just for fun. Here we use `veth`

devices and network namespaces to create a small virtual network, connected together with an Open vSwitch instance. I’m using a Raspberry Pi 3 for this, it’s less inconvenient when it goes wrong, but I don’t think anything is Pi specific (and I certainly wouldn’t recommend a Pi for serious routing applications).

A `veth`

device pair is a virtual ethernet cable, packets sent on one end come out the other (and vice versa of course):

$ sudo ip link add veth0 type veth peer name veth1 $ ip link show type veth 4: veth1@veth0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 92:e3:6f:51:b7:96 brd ff:ff:ff:ff:ff:ff 5: veth0@veth1: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether aa:4c:fd:e3:cc:a1 brd ff:ff:ff:ff:ff:ff

I could assign an IP to both ends and try to send traffic through the link, but since the system knows about both ends, the traffic would get sent directly to the destination interface. Instead, I need to hide one end in a network namespace, each namespace has a set of interfaces, routing tables etc. that are private to that namespace. Initially everything is in the global namespace and we can create a new namespace, which are often named after colours, with the `ip`

command:

$ sudo ip netns add blue

Now put the “lower” end of the `veth`

device into the new namespace:

$ sudo ip link set veth1 netns blue

`veth1`

is no longer visible in the global namespace:

$ ip link show type veth 5: veth0@if4: mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 link/ether aa:4c:fd:e3:cc:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0

but we can see it in the blue namespace:

$ sudo ip netns exec blue ip link show 1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 4: veth1@if5: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 92:e3:6f:51:b7:96 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Note that `veth0`

is in state `LOWERLAYERDOWN`

because `veth1`

is now `DOWN`

(as is the local interface in the namespace). We can now assign addresses to `veth0`

and `veth1`

and make sure all the interfaces are up:

$ sudo ip addr add 10.0.0.10/24 dev veth0 $ sudo ip netns exec blue ip addr add 10.0.0.1/24 dev veth1 $ sudo ip link set veth0 up $ sudo ip netns exec blue ip link set veth1 up $ sudo ip netns exec blue ip link set lo up

Now we can ping the other end:

$ ping -c1 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.197 ms --- 10.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.197/0.197/0.197/0.000 ms

and `tcpdump`

confirms that traffic really is being sent over the `veth`

link:

$ sudo tcpdump -i veth0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes 16:36:50.390395 IP 10.0.0.10 > 10.0.0.1: ICMP echo request, id 20113, seq 1, length 64 16:36:50.390714 IP 10.0.0.1 > 10.0.0.10: ICMP echo reply, id 20113, seq 1, length 64

That’s all we need for the most basic setup. Now we’ll add a second namespace and connect everything together with a switch – we could use a normal Linux bridge for this, but it’s more fun to use Open vSwitch and later use some very basic Openflow commands to set up a learning switch.

For a more complicated setup it’s usually a good idea to enable IP forwarding, so while we remember:

$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward"

And we now want another `veth`

pair and another namespace:

$ sudo ip netns add red $ sudo ip link add veth2 type veth peer name veth3 $ sudo ip link set veth3 netns red $ sudo ip netns exec red ip addr add 10.0.0.2/24 dev veth3 $ sudo ip netns exec red ip link set lo up $ sudo ip netns exec red ip link set veth3 up

Let’s remove the address assigned above to `veth0`

(we are going to put `veth0`

in the bridge anyway, but explicitly removing the address is tidier and prevents confusion later):

$ sudo ip addr del 10.0.0.10/24 dev veth0

Check we have openvswitch installed, on Ubuntu:

$ sudo apt-get install openvswitch-switch

and see what is already running:

$ sudo ovs-vsctl show b494c304-46b7-4ff8-9fa4-581952fae2f1 ovs_version: "2.3.0"

Add a new bridge:

$ sudo ovs-vsctl add-br ovsbr0 $ sudo ovs-vsctl show b494c304-46b7-4ff8-9fa4-581952fae2f1 Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal ovs_version: "2.3.0"

If you’ve been experimenting, remove the upper `veth`

s from any bridge they might be in:

$ sudo ip link set veth0 nomaster $ sudo ip link set veth2 nomaster

and add to the OVS bridge:

$ sudo ovs-vsctl add-port ovsbr0 veth0 $ sudo ovs-vsctl add-port ovsbr0 veth2 $ sudo ovs-vsctl show b494c304-46b7-4ff8-9fa4-581952fae2f1 Bridge "ovsbr0" Port "veth0" Interface "veth0" Port "veth2" Interface "veth2" Port "ovsbr0" Interface "ovsbr0" type: internal ovs_version: "2.3.0" $ ip link show type veth 5: veth0@if4: mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 4a:b7:05:b1:29:d6 brd ff:ff:ff:ff:ff:ff link-netnsid 0 7: veth2@if6: mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 5e:a7:f8:99:d4:ba brd ff:ff:ff:ff:ff:ff link-netnsid 1

Master for `veth0`

and `veth2`

is now the `ovs-system`

device. Note that both links are `UP`

.

Now we are ready to go (we set up everything within the namespaces earlier):

$ sudo ip netns exec blue ping -c1 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.922 ms --- 10.0.0.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.922/0.922/0.922/0.000 ms

Let’s try external connectivity:

$ sudo ip netns exec blue ping -c1 8.8.8.8 connect: Network is unreachable

Looks like a routing problem:

$ sudo ip netns exec blue ip route 10.0.0.0/24 dev veth1 proto kernel scope link src 10.0.0.1

There is no default route, let’s add one:

$ sudo ip netns exec blue ip route add default via 10.0.0.254

and this will need an address on the bridge itself:

$ sudo ip addr add 10.0.0.254/24 dev ovsbr0

Try again:

$ sudo ip netns exec blue ping -c1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. --- 8.8.8.8 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms

Better, we seem to be sending packets out of the namespace and this is confirmed by tcpdump on the bridge interface:

$ sudo tcpdump -i ovsbr0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ovsbr0, link-type EN10MB (Ethernet), capture size 262144 bytes 13:20:19.777324 IP 10.0.0.1 > google-public-dns-a.google.com: ICMP echo request, id 5667, seq 1, length 64 13:20:24.831303 ARP, Request who-has 10.0.0.254 tell 10.0.0.1, length 28 13:20:24.831565 ARP, Reply 10.0.0.254 is-at 06:d4:34:9b:26:42 (oui Unknown), length 28

And we can see the packet exiting on `wlan0`

:

$ sudo tcpdump -i wlan0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wlan0, link-type EN10MB (Ethernet), capture size 262144 bytes 13:21:54.727306 IP 10.0.0.1 > google-public-dns-a.google.com: ICMP echo request, id 5697, seq 1, length 64

but sadly the source address is still in the 10.0.0.0 subnet and it’s not surprising that the Google DNS server isn’t responding.

Now, part of this exercise is to find out about Open vSwitch and its capabilities and I would hope that they would include setting up simple NAT translation, but I have no idea how to do that right now, so we’ll just use IP tables, so set up NAT and make sure forwarding is enabled:

$ sudo iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o wlan0 -j MASQUERADE $ sudo iptables -F $ sudo iptables -P FORWARD ACCEPT

Now all is well:

$ sudo ip netns exec blue ping -n -c1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=19.4 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 19.434/19.434/19.434/0.000 ms

This assumes we have a default forwarding policy of ACCEPT. It may be prudent to be more selective:

$ sudo iptables -P FORWARD DROP $ sudo iptables -A FORWARD -s 10.0.0.0/24 -o wlan0 -j ACCEPT $ sudo iptables -A FORWARD -d 10.0.0.0/24 -i wlan0 -j ACCEPT

so we are just prepared to forward to and from the switch network.

We can add an external IP to the switch. The main connection to my Pi 3 is through `wlan0`

and so I’d like to leave that alone, so let’s put `eth0`

into the switch:

$ sudo ovs-vsctl add-port ovsbr0 eth0

Now attach a network cable eg. directly to another laptop (crossover cables are largely a thing of the past), configure an ip address in our 10.0.0.0/24 subnet:

$ sudo ip addr add 10.0.0.100/24 eth0

and we have connectivity out of our box – the external IP is now the switch address.

Finally, since we have been doing so well, let’s program our OVS bridge to be a learning switch. See, for example, http://openvswitch.org/support/dist-docs-2.5/tutorial/Tutorial.md.html for further information.

First, turn off the default flow rules:

$ sudo ovs-vsctl set Bridge ovsbr0 fail-mode=secure

Before when we created the OVS bridge, it started in “Normal” mode, with a single flow rule that sends every incoming packet out of every interface (except the one that it came in on), so the bridge is acting like a hub. Setting “fail-mode=secure” means there are no default rules so all packets are dropped.

First, if we have been playing, it’s a good idea to clear the rule table:

ovs-ofctl del-flows ovsbr0

Now set up the learning rules. The idea is that when a packet comes in from a particular MAC address, the switch remembers which interface the packet arrived on, so when it wants to send a packet to that address, it can just send it on the interface recorded earlier. We can do a similar thing with the local interface so we don’t need to configure the rules to handle whatever the local MAC address is (maybe there is a better way to handle the local interface – comments welcome).

ovs-ofctl add-flow ovsbr0 "table=0, priority=60, in_port=LOCAL, actions=learn(table=10, NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], load:0xffff->NXM_NX_REG0[0..15]), resubmit(,1)" ovs-ofctl add-flow ovsbr0 "table=0, priority=50, actions=learn(table=10, NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15]), resubmit(,1)"

The first rule says that when a packet originating locally is received, ie. that is being sent from a local process, add a rule (to table 10) that says that when an incoming packet is received, addressed to same MAC address, put the value 0xFFFF in the lower 16 bits of register 0. The second is the same but for packets received from the other interfaces in the switch, add a rule that puts the interface number in register 0. Having added a rule, processing continues with table 1.

In table 1, we have:

ovs-ofctl add-flow ovsbr0 "table=1 priority=99 dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,2)" ovs-ofctl add-flow ovsbr0 "table=1 priority=50 actions=resubmit(,10), resubmit(,2)"

The first rule sends packets with a broadcast ethernet address directly through to table 2, the second rule goes through table 10 first – the idea being that if the packet is being sent to a known MAC address, table 10 will put the number of the interface in register 0, or 0xffff if it’s the local MAC address, or 0 if the interface hasn’t been learned yet.

Finally table 2 just sends packets off to the right place using the register 0 values:

ovs-ofctl add-flow ovsbr0 "table=2 reg0=0 actions=LOCAL,1,2,3" ovs-ofctl add-flow ovsbr0 "table=2 reg0=1 actions=1" ovs-ofctl add-flow ovsbr0 "table=2 reg0=2 actions=2" ovs-ofctl add-flow ovsbr0 "table=2 reg0=3 actions=3" ovs-ofctl add-flow ovsbr0 "table=2 reg0=0xffff actions=LOCAL"

To inspect the all rules table (including any that have been added by the table 1 rules):

$ sudo ovs-ofctl dump-flows ovsbr0

Now thing should work much as they did before. If not, see the link above for further information on OVS testing and debugging.

# Using the Cycle Counter Registers on the Raspberry Pi 3

**Posted:**January 27, 2018

**Filed under:**C, Raspberry Pi 10 Comments

ARM processors support various performance monitoring registers, the most basic being a cycle count register. This is how to make use of it on the Raspberry Pi 3 with its ARM Cortex-A53 processor. The A53 implements the ARMv8 architecture which can operate in both 64- and 32-bit modes, the Pi 3 uses the 32-bit AArch32 mode, which is more or less backwards compatible with the ARMv7-A architecture, as implemented for example by the Cortex-A7 (used in the early Pi 2’s) and Cortex-A8. I hope I’ve got that right, all these names are confusing

The performance counters are made available through coprocessor registers and the `mrc`

and `mcr`

instructions, the precise registers used depending on the particular architecture.

By default, use of these instructions is only possible in “privileged” mode, ie. from the kernel, so the first thing we need to do is to enable register access from userspace. This can be done through a simple kernel module that can also set up the cycle counter parameters needed (we could do this from userspace after the kernel module has enabled access, but it’s simpler to do everything at once).

To compile a kernel module, you need a set of header files compatible with the kernel you are running. Fortunately, if you have installed a kernel with the `raspberrypi-kernel`

package, the corresponding headers should be in `raspberrypi-kernel-headers`

– if you have used `rpi-update`

, you may need to do something else to get the right headers, and of course if you have built your own kernel, you should use the headers from there. So:

`$ sudo apt-get install raspberrypi-kernel`

$ sudo apt-get install raspberrypi-kernel-headers

Our Makefile is just:

obj-m += enable_ccr.o all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

and the kernel module source is:

#include <linux/module.h> #include <linux/kernel.h> void enable_ccr(void *info) { // Set the User Enable register, bit 0 asm volatile ("mcr p15, 0, %0, c9, c14, 0" :: "r" (1)); // Enable all counters in the PNMC control-register asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(1)); // Enable cycle counter specifically // bit 31: enable cycle counter // bits 0-3: enable performance counters 0-3 asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x80000000)); } int init_module(void) { // Each cpu has its own set of registers on_each_cpu(enable_ccr,NULL,0); printk (KERN_INFO "Userspace access to CCR enabled\n"); return 0; } void cleanup_module(void) { }

To build the module, just use `make`

:

`$ make`

and if all goes well, the module itself should be built as `enable_ccr.ko`

Install it:

`$ sudo insmod enable_ccr.ko`

`$ dmesg | tail`

should show something like:

`...`

[ 430.244803] enable_ccr: loading out-of-tree module taints kernel.

[ 430.244820] enable_ccr: module license 'unspecified' taints kernel.

[ 430.244824] Disabling lock debugging due to kernel taint

[ 430.245300] User-level access to CCR has been turned on

...

It should go without saying that making your own kernel modules & allowing normally forbidden access from userspace may result in all sorts of potential vulnerabilities that you should be wary of).

Now we can use the cycle counters in user code:

#include <stdio.h> #include <stdint.h> static inline uint32_t ccnt_read (void) { uint32_t cc = 0; __asm__ volatile ("mrc p15, 0, %0, c9, c13, 0":"=r" (cc)); return cc; } int main() { uint32_t t0 = ccnt_read(); uint32_t t1 = ccnt_read(); printf("%u\n", t1-t0); volatile uint64_t n = 100000000; while(n > 0) n--; t1 = ccnt_read(); printf("%u\n", t1-t0); }

We use a volatile loop counter so the loop isn’t optimized away completely.

Using taskset to keep the process on one CPU:

`$ gcc -Wall -O3 cycles.c -o cycles`

pi@pi3:~/raspbian-ccr$ time taskset 0x1 ./cycles

1

805314304

```
```

`real 0m0.712s`

user 0m0.700s

sys 0m0.010s

Looks like we can count a single cycle and since the Pi 3 has a 1.2GHz clock the loop time looks about right (the clock seems to be scaled if the processor is idle so we don’t necessarily get a full 1.2 billion cycles per second – for example, if we replace the loop above with a sleep).

References:

ARMv8 coprocessor registers:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/Bgbjjhaj.html

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/Bgbdeggf.html

A useful forum discussion (which includes some details on accessing other performance counters):

https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=181490

Cycle counters on the Pi 1:

https://blog.regehr.org/archives/794

# Now on Shadertoy

**Posted:**December 5, 2017

**Filed under:**Uncategorized Leave a comment

My recent Shadertoy activity:

https://www.shadertoy.com/user/mla/sort=newest

# Drawing the Clebsch Surface as Particles

**Posted:**April 6, 2016

**Filed under:**Geometry, Graphics, Javascript 2 Comments

http://matthewarcus.github.io/polyjs/clebsch.html

https://github.com/matthewarcus/polyjs/blob/master/js/clebsch.js

Instead of using a triangulated mesh, we can display a surface in 3d by simply generating a set of random points on the surface and displaying them as a sort of particle system. Let’s do this with the famous Clebsch cubic surface: the points are stored as homogeneous coordinates, to display we multiply by a quaternion to rotate in projective space before doing the usual perspective projection to Euclidean space.

The Clebsch surface is the set of points `(x0,x1,x2,x3,x4)`

(in projective 4-space) that satisfy the equations:

`x0 + x1 + x2 + x3 + x4 = 0`

x0^{3} + x1^{3} + x2^{3} + x3^{3} + x4^{3} = 0

To simplify things, we can eliminate `x0 (= x1 + x2 + x3 + x4)`

and rename a little, to get the single cubic equation:

`(x + y + z + w)`

^{3} = x^{3} + y^{3} + z^{3} + w^{3} [***]

defining a surface in projective 3-space, with the familiar 4-element homogeneous coordinates.

Since coordinates are homogeneous, we can just consider the cases of `w = 1`

and `w = 0`

(plane at infinity), but for `w = 0`

, it turns out the solutions are some of the 27 lines which we shall later draw separately, so for now just consider the case `w = 1 `

for which we have:

`(x + y + z + 1)`

^{3} = x^{3} + y^{3} + z^{3} + 1

and given values for `x`

and `y`

, we can solve for `z`

easily – the cubes drop out and we just have a quadratic equation that can be solved in the usual way:

`3Az`

^{2} + 3A^{2}z + A^{3} - B = 0 where A = x+y+1, B = x^{3} + y^{3} + 1

We can now generate points on the surface by randomly choosing `x`

and `y`

and solving for `z`

to give a set of homogeneous points `(x,y,z,w)`

satisfying `[***]`

and we can get further solutions by permuting the coordinates. We don’t need all permutations since some of the coordinates are arbitrary, and points that are multiples of each other are equivalent. The random points themselves are generated by this Javascript function, that generates points between `-Infinity`

and `+Infinity`

, but clustered around the origin.

function randpoint() { var x = 1/(2*Math.random()-1); if (x < 0) x -= 1; if (x > 0) x += 1; return x; }

The Clebsch surface of course is famous for its 27 lines, so we draw these in as well, also as random selection of points rather than a solid line. 15 lines are points of the form `(a,-a,b,-b,0)`

and permutations – since we are working in 4-space, this becomes 12 lines of form `(a,-a,b,0)`

and three of form `(a,-a,b,-b)`

. These 15 lines are drawn in white and can be seen to intersect in 10 Eckardt points where 3 lines meet (though it’s hard to find a projection where all 10 are simultaneously visible). The other 12 lines are of the form `(a,b,-(φa+b),-(a+φb),1)`

where φ is the golden ratio, 1.618.. and can be seen to form Schläfli’s “Double Six” configuration – each magenta or cyan line intersects with exactly 5 other lines, all of the opposite color.

All that remains is to project into 3-space – as usual we divide by the w-coordinate, but to get different projections, before doing this we rotate in projective space by multiplying by a quaternion & then varying the quaternion varies the projection. (Quaternion `(d,-a,-b,-c)`

puts plane `(a,b,c,d)`

at infinity – or alternatively, rotates `(a,b,c,d)`

to `(0,0,0,1)`

– it is well known that quaternions can be used to represent rotations in 3-space, but they also work for 4-space (with 3-space as a special case) – a 4-space rotation is uniquely represented (up to sign) by `x -> pxq`

where p and q are unit quaternions). Here we multiply each point by a single quaternion to give an isoclinic or Clifford rotation – every point is rotated by the same angle.

We are using Three.js, which doesn’t seem to accept 4d points in geometries – we could write our own vertex shader to do the rotation and projection on the GPU, but for now, we do it on the CPU; updating the point positions is reasonably fast with the Three.js BufferGeometry. Actually displaying the points is simple with the THREE.Points object – we use a simple disc texture to make things a little more interesting, and attempt to color the points according to the permutations used to generate them.

The mouse and arrow keys control the camera position, square brackets move through different display modes, space toggles the rotation.

An excellent reference giving details of the construction of the surface (and good French practise) is:

http://www.mathcurve.com/surfaces/clebsch/clebsch.shtml

# Excellent Numbers

**Posted:**January 16, 2016

**Filed under:**Number Theory, Python Leave a comment

A number `n`

, with an even number of digits, is excellent if it can be split into two halves, `a`

and `b`

, such that `b`

. Let ^{2} - a^{2} = n`2k`

be the number of digits, then we want `n = aA + b = b`

.^{2} - a^{2} where A = 10^{k}

Let’s do some algebra:

`aA + b = b`

^{2} - a^{2}

a^{2} + aA = b^{2} - b # Rearrange

4a^{2} + 4aA = 4b^{2} - 4b # Multiply by 4

(2a + A)^{2} - A^{2} = (2b - 1)^{2} - 1 # Complete the square

Now we can substitute `X = 2a + A`

, `Y = 2b - 1`

, `N = A`

and rearranging a little, we have:^{2} - 1

`X`

^{2} - Y^{2} = N

(X - Y)(X + Y) = N

So, every `2k`

digit excellent number gives rise to divisors `i,j`

of `N`

where `ij = N`

and `i <= j`

This process can be reversed: if `i`

is a divisor of `N`

, with `j = N/i`

and `i <= j`

, we have `X = (j+i)/2`

, `Y = (j-i)/2`

, then `a = (X-A)/2`

and `b = (Y+1)/2`

. If all the divisions by 2 are exact (and in this case they are – `N`

is odd, so `i`

and `j`

are too, also writing `i = 2i'+1`

and `j = 2j'+1`

, we can show that `i'`

and `j'`

must have different parities) then we have a potentially excellent number – all we need to check is that `a`

has exactly `k`

digits and that `b`

has at most `k`

(otherwise `a`

and `b`

are not the upper and lower halves of a `2k`

digit number).

Now we have a nice algorithm: find all divisors `i`

of `N = 10`

, with ^{k}-1`i <= sqrt(N)`

, find `a`

and `b`

as above and check if they are in the appropriate range, if so, we have an excellent number and it should be clear that all excellent numbers can be generated in this way.

For small `N`

, we can find all divisors just by a linear scan, but for larger `N`

something better is needed: given a prime factorization we can generate all possible combinations of the factors to get the divisors, so now all we need to do is factorize `10`

. This of course is a hard problem but we can use, for example, Python’s ^{2k}-1`primefac`

library, and give it some help by observing that `10`

. The factorization is harder for some values of ^{2k}-1 = (10^{k}-1)(10^{k}+1)`k`

, particularly if `k`

is prime, but we can always have a look at:

http://stdkmd.com/nrr/repunit/

if we run in to trouble. My Pi 2 gets stuck at `k = 71`

where 11, 290249, 241573142393627673576957439049, 45994811347886846310221728895223034301839 and 31321069464181068355415209323405389541706979493156189716729115659 are the factors needed, so it’s not surprising it is struggling. Also, the number of divisors to check is approximately `2`

where ^{n-1}`n`

is the number of prime factors, of which, for example `10`

has 35 so just generating all potential 180 digit numbers will take a while.^{90}-1

So, after all that, here’s some code. Using Python generators keeps the memory usage down – we can process each divisor as it is constructed, (though it does mean that results for a particular size don’t come out in order) – after running for around 24 hours on a Pi 2, we are up to 180 digits and around 2000000 numbers but `top`

reports less than 1% of memory in use.

import primefac def excellent(k): """Generate all excellent numbers of size 2k""" A = 10**k; N = A*A-1 factors1 = list(primefac.primefac(A-1)) factors2 = list(primefac.primefac(A+1)) d = divisors(sorted(factors1+factors2)) for i in d: if i*i > N: continue j = N//i x,y = (j+i)//2, (j-i)//2 a,b = (x-A)//2, (y+1)//2 if A//10 <= a < A and 0 <= b < A: n = a*A+b assert(n == b*b-a*a) # Check our logic yield n def divisors(factorlist): """Generate all divisors of number from list of prime factors""" factors = multiset(factorlist) nfactors = len(factors) a = [0]*nfactors; b = [1]*nfactors yield 1 while True: i = 0 while i < nfactors: if a[i] < factors[i][1]: break a[i] = 0; b[i] = 1; i += 1 if i == nfactors: break a[i] += 1; b[i] *= factors[i][0] for j in range(0,i): b[j] = b[i] yield b[0] def multiset(s): """Create a multiset from a (sorted) list of items""" m = []; n = s[0]; count = 1 for i in range(1,len(s)): if s[i] != n: m.append((n,count)) n = s[i] count = 1 else: count += 1 m.append((n,count)) return m for n in range(2,1000,2): for m in excellent(n//2): print m

The counts for numbers up to 10^{100}:

2: count = 1 factors = [3, 3, 11] 4: count = 1 factors = [3, 3, 11, 101] 6: count = 8 factors = [3, 3, 3, 7, 11, 13, 37] 8: count = 3 factors = [3, 3, 11, 73, 101, 137] 10: count = 3 factors = [3, 3, 11, 41, 271, 9091] 12: count = 13 factors = [3, 3, 3, 7, 11, 13, 37, 101, 9901] 14: count = 2 factors = [3, 3, 11, 239, 4649, 909091] 16: count = 3 factors = [3, 3, 11, 17, 73, 101, 137, 5882353] 18: count = 28 factors = [3, 3, 3, 3, 7, 11, 13, 19, 37, 52579, 333667] 20: count = 15 factors = [3, 3, 11, 41, 101, 271, 3541, 9091, 27961] 22: count = 9 factors = [3, 3, 11, 11, 23, 4093, 8779, 21649L, 513239L] 24: count = 51 factors = [3, 3, 3, 7, 11, 13, 37, 73, 101, 137, 9901, 99990001] 26: count = 5 factors = [3, 3, 11, 53, 79, 859, 265371653, 1058313049] 28: count = 17 factors = [3, 3, 11, 29, 101, 239, 281, 4649L, 909091L, 121499449] 30: count = 435 factors = [3, 3, 3, 7, 11, 13, 31, 37, 41, 211, 241, 271, 2161, 9091, 2906161] 32: count = 157 factors = [3, 3, 11, 17, 73, 101, 137, 353, 449, 641, 1409, 69857, 5882353] 34: count = 4 factors = [3, 3, 11, 103, 4013L, 2071723L, 5363222357L, 21993833369L] 36: count = 66 factors = [3, 3, 3, 3, 7, 11, 13, 19, 37, 101, 9901L, 52579L, 333667L, 999999000001L] 38: count = 2 factors = [3, 3, 11, 909090909090909091L, 1111111111111111111L] 40: count = 103 factors = [3, 3, 11, 41, 73, 101, 137, 271, 3541L, 9091L, 27961L, 1676321L, 5964848081L] 42: count = 999 factors = [3, 3, 3, 7, 7, 11, 13, 37, 43, 127, 239, 1933L, 2689L, 4649L, 459691L, 909091L, 10838689L] 44: count = 89 factors = [3, 3, 11, 11, 23, 89, 101, 4093L, 8779L, 21649L, 513239L, 1052788969L, 1056689261L] 46: count = 2 factors = [3, 3, 11, 47, 139, 2531L, 549797184491917L, 11111111111111111111111L] 48: count = 188 factors = [3, 3, 3, 7, 11, 13, 17, 37, 73, 101, 137, 9901L, 5882353L, 99990001L, 9999999900000001L] 50: count = 45 factors = [3, 3, 11, 41, 251, 271, 5051L, 9091L, 21401L, 25601L, 182521213001L, 78875943472201L] 52: count = 11 factors = [3, 3, 11, 53, 79, 101, 521, 859, 265371653L, 1058313049L, 1900381976777332243781L] 54: count = 150 factors = [3, 3, 3, 3, 3, 7, 11, 13, 19, 37, 757, 52579L, 333667L, 70541929L, 14175966169L, 440334654777631L] 56: count = 99 factors = [3, 3, 11, 29, 73, 101, 137, 239, 281, 4649L, 7841L, 909091L, 121499449L, 127522001020150503761L] 58: count = 2 factors = [3, 3, 11, 59, 3191L, 16763L, 43037L, 62003L, 77843839397L, 154083204930662557781201849L] 60: count = 35929 factors = [3, 3, 3, 7, 11, 13, 31, 37, 41, 61, 101, 211, 241, 271, 2161L, 3541L, 9091L, 9901L, 27961L, 2906161L, 4188901L, 39526741L] 62: count = 2 factors = [3, 3, 11, 2791L, 6943319L, 57336415063790604359L, 909090909090909090909090909091L] 64: count = 1162 factors = [3, 3, 11, 17, 73, 101, 137, 353, 449, 641, 1409L, 19841L, 69857L, 976193L, 5882353L, 6187457L, 834427406578561L] 66: count = 478 factors = [3, 3, 3, 7, 11, 11, 13, 23, 37, 67, 4093L, 8779L, 21649L, 513239L, 599144041L, 183411838171L, 1344628210313298373L] 68: count = 28 factors = [3, 3, 11, 101, 103, 4013L, 2071723L, 28559389L, 1491383821L, 5363222357L, 21993833369L, 2324557465671829L] 70: count = 146 factors = [3, 3, 11, 41, 71, 239, 271, 4649L, 9091L, 123551L, 909091L, 4147571L, 102598800232111471L, 265212793249617641L] 72: count = 3627 factors = [3, 3, 3, 3, 7, 11, 13, 19, 37, 73, 101, 137, 3169L, 9901L, 52579L, 98641L, 333667L, 99990001L, 999999000001L, 3199044596370769L] 74: count = 4 factors = [3, 3, 11, 7253L, 2028119L, 247629013L, 422650073734453L, 296557347313446299L, 2212394296770203368013L] 76: count = 5 factors = [3, 3, 11, 101, 722817036322379041L, 909090909090909091L, 1111111111111111111L, 1369778187490592461L] 78: count = 700 factors = [3, 3, 3, 7, 11, 13, 13, 37, 53, 79, 157, 859, 6397L, 216451L, 265371653L, 1058313049L, 388847808493L, 900900900900990990990991L] 80: count = 605 factors = [3, 3, 11, 17, 41, 73, 101, 137, 271, 3541L, 9091L, 27961L, 1676321L, 5070721L, 5882353L, 5964848081L, 19721061166646717498359681L] 82: count = 2 factors = [3, 3, 11, 83, 1231L, 538987L, 2670502781396266997L, 3404193829806058997303L, 201763709900322803748657942361L] 84: count = 59490 factors = [3, 3, 3, 7, 7, 11, 13, 29, 37, 43, 101, 127, 239, 281, 1933L, 2689L, 4649L, 9901L, 226549L, 459691L, 909091L, 10838689L, 121499449L, 4458192223320340849L] 86: count = 9 factors = [3, 3, 11, 173, 1527791L, 57009401L, 2182600451L, 1963506722254397L, 2140992015395526641L, 7306116556571817748755241L] 88: count = 105 factors = [3, 3, 11, 11, 23, 73, 89, 101, 137, 617, 4093L, 8779L, 21649L, 513239L, 1052788969L, 1056689261L, 16205834846012967584927082656402106953L] 90: count = 50344 factors = [3, 3, 3, 3, 7, 11, 13, 19, 31, 37, 41, 211, 241, 271, 2161L, 9091L, 29611L, 52579L, 238681L, 333667L, 2906161L, 3762091L, 8985695684401L, 4185502830133110721L] 92: count = 26 factors = [3, 3, 11, 47, 101, 139, 1289L, 2531L, 18371524594609L, 549797184491917L, 11111111111111111111111L, 4181003300071669867932658901L] 94: count = 2 factors = [3, 3, 11, 6299L, 35121409L, 4855067598095567L, 297262705009139006771611927L, 316362908763458525001406154038726382279L] 96: count = 80002 factors = [3, 3, 3, 7, 11, 13, 17, 37, 73, 97, 101, 137, 353, 449, 641, 1409L, 9901L, 69857L, 206209L, 5882353L, 99990001L, 66554101249L, 75118313082913L, 9999999900000001L] 98: count = 10 factors = [3, 3, 11, 197, 239, 4649L, 909091L, 505885997L, 1976730144598190963568023014679333L, 5076141624365532994918781726395939035533L] 100: count = 3573 factors = [3, 3, 11, 41, 101, 251, 271, 3541L, 5051L, 9091L, 21401L, 25601L, 27961L, 60101L, 7019801L, 182521213001L, 14103673319201L, 78875943472201L, 1680588011350901L]

Time to generate that lot, about 3m30s on my Core I3 laptop, about 30m on my Raspberry Pi 2.

Finally, if we want really big numbers, then we don’t need a full factorization – http://stdkmd.com/nrr/repunit/ has enough factors of 10^{2016}-1, for example, to find, among others:

467203616037752709753640875404905278610286278867781588976105221346970432 395736683257666616868811083043941463314513845761368180251295614288295272 712974920189045619317506423133721608367014923830041699210760183164585217 599174938750569917513095900320978144876083087591215818424979192187341459 756509047543186958692244904217361382312220589759551138455399864423950556 877618463844146927376784311673205223822619959320039184981861810037019868 259785305305318574127400789513920685551635257636719954377249042716215901 383844447790548649266546486400561808622649593166435681150190744136685886 335659851446510906275097594875425811830427470257238967118169107518206073 615596540306297513679739458887733205329861379807932402155440131595447258 905459620119681006553819769296088325096562295835757909167184772591564873 889571115660015460170065915133627063917081464432904541564462471704065596 995864597351005042459541065531684772723537478273137238536882143270837967 355784692820692700257668090096174851711901872379544776234666991080481050 827938907695794044434449897483410036041789283630321915114061710879582491 425436499562245749681317500307257068229179611956588865266983050266693593 782449462829670744802988126608915433835744545694648340583599198978087691 600147477867595689158751559170609174089828925445708502246431435332615649 355141085750085715940292846105899585176839916452603275591677348739914677 778216128911941208439459006047576543241292648992222090990416741035195043 278539082418243317317374583211686841192215307443355453487377377290350196 371354628663013301973808912217716856093563005467214390853254407353722627 416963492751125708925240306094459245276161787679224919637918913006752210 513662791886758732646236407566089976349268846643228836678505302621204788 560791609138040737880910308235667105956827669559476643173829425259196119 155907391268256981917731226756506952400793923896521065760681749285568778 344719401424538913519492510286853888028737195080140956569785690813785590 037948199410250551049984142378021668001361835139075663440830359075663125

Which should be enough excellence for anybody. If not, https://www.dropbox.com/s/9xdnxd0ifla0zhf/excellent100.txt.gz has a list of all numbers up to 100 digits.

# Drawing Uniform Polyhedra with Javascript, WebGL and Three.js

**Posted:**June 16, 2015

**Filed under:**Geometry, Graphics, Javascript Leave a comment

I’ve been meaning to write this up properly for a long time, but for now, this link will have to do:

http://matthewarcus.github.io/polyjs/

The idea is to use some fairly straightforward vector geometry to generate uniform polyhedra and their derivatives, using the kaleidoscopic construction to generate Schwarz triangles that tile the sphere. We use spherical trilinear coordinates within each Schwarz triangle to determine polyhedron vertices (with the trilinear coordinates being converted to barycentric for actually generating the points). Vertices for snub polyhedra are found by iterative approximation.

We also can use Schwarz triangles to apply other symmetry operations to the basic polyhedra to generate compound polyhedra, including all of the uniform compounds enumerated by John Skilling (as well as many others).

There are some other features including construction of dual figures, final stellations, inversions, subdividing polyhedra faces using a Sierpinksi construction, as well as various colouring effects, exploding faces etc..

Much use was made of the work of others, notably George Hart and Zvi Har’El as well as the wonderful Three.js library by Mr.doob.

# Javascript Quines

**Posted:**October 25, 2014

**Filed under:**Javascript 1 Comment

A Quine is a program that when run, prints itself. For example, in Javascript we can write:

`> (function $(){console.log('('+$+')()');})()`

*(function $(){console.log('('+$+')()');})()*

This is nice, but we can see that it depends on the fact that in Javascript a function automatically converts to a string that is its own source code; also, it would be nice to get rid of the explicit function binding.

Using an idea from Lisp (and this is surely inspired by the (λx.xx)(λ.xx) of the lambda calculus), we can use an anonymous function:

`> var f = function(s) { console.log("("+s+")"+"("+s+")");}`

> f(f)

*(function (s) { console.log("("+s+")"+"("+s+")");})(function (s) { console.log("("+s+")"+"("+s+")");})*

> (function (s) { console.log("("+s+")"+"("+s+")");})(function (s) { console.log("("+s+")"+"("+s+")");})

*(function (s) { console.log("("+s+")"+"("+s+")");})(function (s) { console.log("("+s+")"+"("+s+")");})*

This works nicely, with no function binding needed (we are just using the definition of f to get going here), but we are still using implicit conversion from functions to strings. Let’s try explicitly quoting the second occurrence of s:

`> var f = function(s){console.log("("+s+")"+"("+"\""+s+"\""+")");}`

> f(f)

*(function (s){console.log("("+s+")"+"("+"\""+s+"\""+")");})("function (s){console.log("("+s+")"+"("+"\""+s+"\""+")");}")*

That isn’t well-formed Javascript though, the single quotes in the stringified version of the function haven’t been escaped, so this won’t compile. We need to somehow insert the quotes into the string, but without getting into an endless regression with extra layer of quotes (this problem really only exists because opening and closing quotes are the same, if quoting nested in the same way that bracketing does, it would all be much easier).

(Note also that while we are using implicit function to string conversion to construct our Quine, the Quine itself doesn’t use that feature).

One simple solution is to insert the quotes as character codes:

`> var f = function(s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}`

> f(f)

*(function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");})('function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}')*

> (function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");})('function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}')

*(function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");})('function (s){console.log("("+s+")"+"("+String.fromCharCode(39)+s+String.fromCharCode(39)+")");}')*

but that seems rather inelegant and while EBCDIC computers are rare these days, it would be good to be character set independent.

We can also use a library function to handle quoting:

`> f = function(s){console.log("("+s+")"+"("+JSON.stringify(s)+")");}`

> f(String(f))

*(function (s){console.log("("+s+")"+"("+JSON.stringify(s)+")");})("function (s){console.log(\"(\"+s+\")\"+\"(\"+JSON.stringify(s)+\")\");}")*

> (function (s){console.log("("+s+")"+"("+JSON.stringify(s)+")");})("function (s){console.log(\"(\"+s+\")\"+\"(\"+JSON.stringify(s)+\")\");}")

*(function (s){console.log("("+s+")"+"("+JSON.stringify(s)+")");})("function (s){console.log(\"(\"+s+\")\"+\"(\"+JSON.stringify(s)+\")\");}")*

but we might object to such a heavyweight solution.

Another way forward is suggested by this C Quine:

`main(){char q='"';char*f="main(){char q='%c';char*f=%c%s%c;printf(f,q,q,f,q);}";printf(f,q,q,f,q);}`

where we avoid quoting in the quoted body by passing in the required characters from outside.

Here’s something similar in Javascript:

`> f = function(s,q,b,k){console.log(b+s+k+b+q+s+q+k);}`

> f(f,"\"","(",")")

*(function (s,q,b,k){console.log(b+s+k+b+q+s+q+k);})("function (s,q,b,k){console.log(b+s+k+b+q+s+q+k);}")*

Now there is no quotation at all in the function body, but this doesn’t quite work as we need to pass in the character parameters in the main function call, and for this we need to pass in the comma and escape characters as well:

`> f = function(s,q,b,k,c,e) { console.log(b+s+k+b+q+s+q+k+c+q+e+q+q+c+q+b+q+c+q+k+q+k); }`

> f(f,"\"","(",")",",","\\")

*(function (s,q,b,k,c,e) { console.log(b+s+k+b+q+s+q+k+c+q+e+q+q+c+q+b+q+c+q+k+q+k); })("function (s,q,b,k,c,e) { console.log(b+s+k+b+q+s+q+k+c+q+e+q+q+c+q+b+q+c+q+k+q+k); }"),"\"","(",")")*

Almost there; we are, again, just missing some parameters in the main call, but this time we can close the loop completely:

`> f = function(s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}`

> f(f,'"',"(",")",",","\\");

*(function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);})("function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}","\"","(",")",",","\\")*

> (function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);})("function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}","\"","(",")",",","\\")

*(function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);})("function (s,q,b,k,c,e){console.log(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k);}","\"","(",")",",","\\")*

Not minimal in terms of length, but fairly minimal in terms of features required – no library functions, no fancy string formatting directives, no name binding apart from function application, no language specific tricks, no character set dependencies; we haven’t even made use of there being two ways to quote strings in Javascript.

Since we aren’t being very specific to Javascript, it’s easy to adapt the solution to other languages:

Haskell:

`(\s q b k e->putStrLn(b++s++k++q++e++s++q++q++e++q++q++q++b++q++q++k++q++q++e++e++q))"\\s q b k e->putStrLn(b++s++k++q++e++s++q++q++e++q++q++q++b++q++q++k++q++q++e++e++q)""\"""("")""\\"`

Python 3 (this doesn’t work in Python 2 as print can’t be used in lambda functions there):

`(lambda s,q,b,k,c,e:print(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k))("lambda s,q,b,k,c,e:print(b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k)","\"","(",")",",","\\")`

Again, there are shorter Quines in both languages, but the ones I have seen all require some extra feature, eg. escaping quotes using “show” in Haskell or “%r” in Python.

Finally, since Javascript has an eval function, we can construct a string that evaluates to itself:

`> f = function(s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;}`

> f(f,"\"","(",")",",","\\")

*'(function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;})("function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;}","\\"","(",")",",","\\\\")'*

> eval(_)

*'(function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;})("function (s,q,b,k,c,e){return b+s+k+b+q+s+q+c+q+e+q+q+c+q+b+q+c+q+k+q+c+q+c+q+c+q+e+e+q+k;}","\\"","(",")",",","\\\\")'*

# Monad Tutorial

**Posted:**September 22, 2014

**Filed under:**Lambda Calculus Leave a comment

Every programming blog should have a monad tutorial, so here we go: we can define a simple denotational semantics for a typed functional language as:

ℰ ⟦v:A⟧_{ρ}= ρ(v):A ℰ ⟦λx:A.e⟧_{ρ}= λa:A.ℰ ⟦e⟧_{ρ[x→a]}: A → B ℰ ⟦e_{1}e_{2}⟧_{ρ}= let f: A → B = ℰ ⟦e_{1}⟧_{ρ}a: A = ℰ ⟦e_{2}⟧_{ρ}in f a: B

Here ρ is an environment, mapping variables to values, and note that in the rule for a λ expression, the λ in the right hand side is defining a function in the domain of values, whereas the left hand side λ is just a linguistic construct. We could decorate every expression with a type, but that would get untidy. There will be other rules for specific operations on whatever actual datatypes are around, but this gives the underlying functional basis on which everything else depends.

We can see that ℰ⟦e⟧_{ρ} is just a value in some semantic domain, which contains, presumably, some basic types and functions between values and the type of ℰ is something like:

ℰ: Exp[A] → Env → A

where Exp[A] is set of expressions of type A (I’m not going to be rigorous about any of this, I’m assuming we have some type system where this sort of thing makes sense, and also I’m not going to worry about the difference between a syntactic type and a semantic type) and Env is the type of environments.

Just for fun, let’s make a distinction (not that there really is one here) between “ordinary” values and “semantic” values, with M[A] being the semantic value with underlying value type A (imagine an ML or Haskell style type constructor M, with a value constructor, also called M, though often we’ll ignore the distinction between the underlying type and the constructed type).

Now ℰ has type:

ℰ: Exp[A] → Env → M[A]

and the underlying value of a function of type A → B is now A → M[B].

We can also rewrite our semantic equations and take a little time persuading ourselves this is equivalent to the definitions above:

ℰ ⟦v:A⟧_{ρ}= inj(ρ(v)): M(A) ℰ ⟦λx:A.e⟧_{ρ}= inj(λa:A.ℰ ⟦e⟧_{ρ[x→a]}): M(A → M[B]) ℰ ⟦e_{1}e_{2}⟧_{ρ}= let a_{1}: M[A → M[B]] = ℰ ⟦e_{1}⟧_{ρ}a_{2}: M[A] = ℰ ⟦e_{2}⟧_{ρ}in apply (λf.apply f a_{2}) a_{1}: M[B]

inj and apply are:

inj: A → M[A] inj(a:A) = M(a) : M[A] apply: (A → M[B]) → M[A] → M[B] apply f (M a) = f a

These functions should look familiar; they are the standard monad operations & using a different monad will give us a different semantics for our basic functional operations.

Let’s introduce state, the basic denotational semantics is something like:

ℰ: Exp[A] → Env → State → (A, State) ℰ ⟦v:A⟧_{ρ σ}= (ρ(v),σ) ℰ ⟦λx.e⟧_{ρ σ}= (λa.ℰ ⟦e⟧_{ρ[x→a]}, σ) ℰ ⟦e_{1}e_{2}⟧_{ρ σ}= let (f,σ') = ℰ ⟦e_{1}⟧_{ρ σ}(a, σ'') = ℰ ⟦e_{2}⟧_{ρ σ'}in f a σ''

(I’ve omitted type decorations here for clarity).

Let’s do the same trick with a special semantic domain (though this time we’ll leave the type constructors implicit) and we have:

M[A] = State→(A, State) inj a σ = (a,σ) apply f g σ = let (a,σ')a = g σ in f a σ'

and we can see that we can just plug these definitions into our generic semantic equations above and get something equivalent to the specific state semantics.

So, a monad is just a typed semantic domain together with the operations necessary to specify the standard functional constructions over that domain. Which sort of makes sense, but it’s nice to see it just drop out of the equations (and of course it’s nice to see that a standard denotational semantics for something like state does actually correspond quite closely with the monadic semantics).

None of this is new, in fact the use of monads to provide a uniform framework for program semantics goes right back to Eugenio Moggi’s original work in the 1980s (which was then taken up in functional programming where elements of the semantic domain itself are modelled as normal data objects).

# At The Bakery

**Posted:**June 10, 2014

**Filed under:**C 1 Comment

After the topical excitement of the last couple of posts, let’s look at an all-time great – Leslie Lamport’s Bakery Algorithm (and of course this is still topical; Lamport is the most recent winner of the Turing Award).

The problem is mutual exclusion without mutual exclusion primitives. Usually, it’s described in the context of a shared memory system (and that is what we will implement here), but will work equally well in a message-passing system with only local state (each thread or process only needs to write to its own part of the store).

For further details, and Lamport’s later thoughts see http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#bakery: “For a couple of years after my discovery of the bakery algorithm, everything I learned about concurrency came from studying it.” – and since Lamport understands more about concurrency than just about anyone on the planet, it’s maybe worth spending some time looking at it ourselves.

I’m not going to attempt to prove the algorithm correct, I’ll leave that to Lamport, but the crucial idea seems to me to be that a thread reading a particular value from another thread is a synchronization signal from that thread – here, reading a false value for the `entering`

variable is a signal that the other thread isn’t in the process of deciding on it’s own number, therefore it is safe for the reading process to proceed.

Implementing on a real multiprocessor system, we find that use of memory barriers or synchronization primitives is essential – the algorithm requires that reads and writes are serialized in the sense that once a value is written, other processes won’t see an earlier value (or earlier values of other variables). This doesn’t conflict with what Lamport says about not requiring low-level atomicity – we can allow reads and writes to happen simultaneously, with the possibility of a read returning a bogus value – and in fact we can simulate this in the program by writing a random value just before a process selects its real ticket number, but once a write has completed, all processes should see the new value.

Another essential feature is the volatile flag – as many have pointed out, this isn’t enough by itself for correct thread synchronization, but for shared memory systems, prevents the compiler from making invalid assumptions about consistency of reads from shared variables.

A final point – correctness requires that ticket numbers can increase without bound, this is hard to arrange in practice, so we just `assert`

if they grow too large (this rarely happens in reality, unless we get carried away with our randomization).

#include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <assert.h> // Compile with: g++ -Wall -O3 bakery.cpp -pthread -o bakery static const int NTHREADS = 4; // Some features to play with //#define NCHECK // Disable crucial check //#define NSYNC // Disable memory barrier //#define NVOLATILE // Disable volatile #if defined NVOLATILE #define VOLATILE #else #define VOLATILE volatile #endif VOLATILE bool entering[NTHREADS]; VOLATILE unsigned number[NTHREADS]; VOLATILE int count = 0; VOLATILE int total = 0; unsigned getmax(int n) { unsigned max = 0; for (int i = 0; i < n; i++) { if (number[i] > max) max = number[i]; } return max; } bool check(int i, int j) { return number[j] < number[i] || (number[j] == number[i] && j < i); } inline void synchronize() { #if !defined NSYNC // gcc builtin full memory barrier __sync_synchronize(); #endif } void lock(int i) { entering[i] = true; synchronize(); // Simulate non-atomic write number[i] = rand(); synchronize(); number[i] = 1 + getmax(NTHREADS); assert(number[i] > 0); entering[i] = false; synchronize(); for (int j = 0; j < NTHREADS; j++) { // Wait until thread j receives its number: #if !defined NCHECK while (entering[j]) { /* nothing */ } #endif // At this point, we have read a false value for // "entering[j]", therefore any number picked by j // later will takes our choice into account, any value // chosen earlier (and so might be less than ours) // will be visible to us in the following test. // Wait until all threads with smaller numbers or with // the same number, but with higher priority, finish // their work: while ((number[j] != 0) && check(i,j)) { /* nothing */ } } } void unlock(int i) { number[i] = 0; } void *threadfun(void *arg) { int i = *(int*)(arg); while (true) { lock(i); total++; if (total % 1000000 == 0) fprintf(stderr,"%c", 'a'+i); assert(count==0); // Check we have exclusive access count++; // It's not clear that these synchs are unnecessary, // but nothing seems to break if I remove them. //synchronize(); count--; //synchronize(); unlock(i); // non-critical section... } return NULL; } int main() { pthread_t t[NTHREADS]; int n[NTHREADS]; for (int i = 0; i < NTHREADS; i++) { n[i] = i; pthread_create(&t[i], NULL, threadfun, (void*)&n[i]); } for (int i = 0; i < NTHREADS; i++) { pthread_join(t[i], NULL); } }