Monad Tutorial

Every programming blog should have a monad tutorial, so here we go: we can define a simple denotational semantics for a typed functional language as:

ℰ ⟦v:A⟧ρ = ρ(v):A
ℰ  ⟦λx:A.e⟧ρ = λa:A.ℰ ⟦e⟧ρ[x→a]: A → B
ℰ ⟦e1e2ρ =
    let f: A → B = ℰ ⟦e1ρ
        a: A = ℰ ⟦e2ρ
    in f a: B

Here ρ is an environment, mapping variables to values, and note that in the rule for a λ expression, the λ in the right hand side is defining a function in the domain of values, whereas the left hand side λ is just a linguistic construct. We could decorate every expression with a type, but that would get untidy. There will be other rules for specific operations on whatever actual datatypes are around, but this gives the underlying functional basis on which everything else depends.

We can see that ℰ⟦e⟧ρ is just a value in some semantic domain, which contains, presumably, some basic types and functions between values and the type of ℰ is something like:

ℰ: Exp[A] → Env → A

where Exp[A] is set of expressions of type A (I’m not going to be rigorous about any of this, I’m assuming we have some type system where this sort of thing makes sense, and also I’m not going to worry about the difference between a syntactic type and a semantic type) and Env is the type of environments.

Just for fun, let’s make a distinction (not that there really is one here) between “ordinary” values and “semantic” values, with M[A] being the semantic value with underlying value type A (imagine an ML or Haskell style type constructor M, with a value constructor, also called M, though often we’ll ignore the distinction between the underlying type and the constructed type).

Now ℰ has type:

ℰ: Exp[A] → Env → M[A]

and the underlying value of a function of type A → B is now A → M[B].

We can also rewrite our semantic equations and take a little time persuading ourselves this is equivalent to the definitions above:

ℰ ⟦v:A⟧ρ = inj(ρ(v)): M(A)
ℰ ⟦λx:A.e⟧ρ = inj(λa:A.ℰ ⟦e⟧ρ[x→a]): M(A → M[B])
ℰ ⟦e1e2ρ = 
  let a1: M[A → M[B]] = ℰ ⟦e1ρ
      a2: M[A] = ℰ ⟦e2ρ
  in apply (λf.apply f a2) a1: M[B]

inj and apply are:

inj: A → M[A]
inj(a:A) = M(a) : M[A]
apply: (A → M[B]) → M[A] → M[B]
apply f (M a) = f a

These functions should look familiar; they are the standard monad operations & using a different monad will give us a different semantics for our basic functional operations.

Let’s introduce state, the basic denotational semantics is something like:

ℰ: Exp[A] → Env → State → (A, State)

ℰ ⟦v:A⟧ρ σ = (ρ(v),σ)
ℰ ⟦λx.e⟧ρ σ = (λa.ℰ ⟦e⟧ρ[x→a], σ)
ℰ ⟦e1e2ρ σ = 
  let (f,σ') = ℰ ⟦e1ρ σ
      (a, σ'') = ℰ ⟦e2ρ σ'
  in f a σ''

(I’ve omitted type decorations here for clarity).

Let’s do the same trick with a special semantic domain (though this time we’ll leave the type constructors implicit) and we have:

M[A] = State→(A, State)

inj a σ = (a,σ)
apply f g σ = 
  let (a,σ')a = g σ 
  in f a σ'

and we can see that we can just plug these definitions into our generic semantic equations above and get something equivalent to the specific state semantics.

So, a monad is just a typed semantic domain together with the operations necessary to specify the standard functional constructions over that domain. Which sort of makes sense, but it’s nice to see it just drop out of the equations (and of course it’s nice to see that a standard denotational semantics for something like state does actually correspond quite closely with the monadic semantics).

None of this is new, in fact the use of monads to provide a uniform framework for program semantics goes right back to Eugenio Moggi’s original work in the 1980s (which was then taken up in functional programming where elements of the semantic domain itself are modelled as normal data objects).


Lambda Interpreter, Part II, Semantics

First thing, we need to be able to define identifiers, at least at the top level, for S, K, I, etc. A simple list of string-value pairs will do, we will fill in our default environment later:

fun getenv [] v = NONE
  | getenv ((v',e)::s) v =
    if (v = v') then SOME e else getenv s v;

(* Make an environment from a list of strings *)
fun mkenv [] k = k
  | mkenv (v::e::s) k = mkenv s ((v,parse e)::k)

We will define two evaluation mechanisms, one will reduce one redex at a time, with the index of the redex passed in to the function, the other will just reduce until a normal form is reached, if there is one.

It’s convenient to use option types to indicate if a subexpression has been reduced, and it’s even more convenient to define some monadic-style helper functions:

fun lift f NONE = NONE
  | lift f (SOME x) = SOME (f x);

fun bind f NONE = NONE
  | bind f (SOME a) = f a;

(* Apply f to a, if no result, apply g to b *)
fun try f a g b = case f a of r as SOME _ => r | NONE => g b;

fun get (SOME x) = x;

Here’s the single redex reducer: try to reduce each subexpression, keeping track of redex indexes. Return SOME(e') if a sub evaluation succeeds, else return NONE with some monadic trickery to handle the mechanics. We avoid expanding global identifiers unless they are in functional position.

fun reduce env n e =
  let
    fun aux n (Var _) = NONE
      | aux 0 (App(Lambda(v,e1),e2)) = SOME (subst v e2 e1)
      | aux n (App(Lambda(v,e1),e2)) =
	try (lift (fn e1' => App(Lambda(v,e1'),e2)) o aux (n-1)) e1
	    (lift (fn e2' => App(Lambda(v,e1),e2')) o aux (n-1)) e2
      | aux n (App(e1 as Var v,e2)) =
	try (bind (fn e1' => aux n (App(e1',e2))) o getenv env) v
	    (lift (fn e2' => App(e1,e2')) o aux n) e2
      | aux n (App(e1,e2)) =
	try (lift (fn e1' => App(e1',e2)) o aux n) e1
	    (lift (fn e2' => App(e1,e2')) o aux n) e2
      | aux n (Lambda(v,e1)) =
	(lift (fn e1' => Lambda(v,e1')) o aux n) e1
  in
      aux n e
  end;

That’s all very well for reducing individual redexes, let’s define something that will just let rip on an expression. We’d like to do a proper normal order evaluation, so we’ll use an evaluation stack: go down left branch of expression, reducing beta redexes as we go. When we have got to the end, there are no more top-level redexes, so recursively evaluate the items on the stack, and finally fold back up in into a single expression:

fun eval env e = 
  let
    fun foldapp(e1::e2::s) = foldapp(App(e1,e2)::s)
      | foldapp ([e1]) = e1;
    fun aux (Lambda(v,e1)) (e2::s) = aux (subst v e2 e1) s
      | aux (Lambda(v,e1)) [] = Lambda(v, eval env e1)
      | aux (App(e1,e2)) s = aux e1 (e2::s)
      | aux (e as Var _) [] = e
      | aux (e as Var v) s = 
        (case getenv env v of
             SOME e' => aux e' s
           | _ => foldapp (map (eval env) (e::s)));
  in
    aux e []
  end;

All we need now is a suitable environment, and we can start reducing.

val stdenv =
    mkenv ["S", "λxyz.(xz)(yz)",
	   "K", "λxy.x",
	   "I", "λx.x",
	   "Y", "λf.(λx.f(xx))(λx.f(xx))",
	   "M", "λxy.y(xy)",
	   "T", "λxy.x",
	   "F", "λxy.y",
	   "Z", "λn.n(λx.F)T",
	   "0", "λf.λn.n",
	   "N", "λn.λf.λx.f(nfx)",
	   "P", "λnfx.n(λgh.h(gf))(λu.x)(λu.u)",
	   "*", "λmnfx.m(nf)x",
	   "+", "λmnfx.mf(nfx)",
	   "1", "N0",
	   "2", "N1",
	   "3", "N2",
	   "4", "N3",
	   "5", "N4",
	   "6", "N5",
	   "7", "N6",
	   "8", "N7",
	   "9", "N8",
	   "H", "Y(λgn.(Zn)1(*n(g(Pn))))"
	  ] [];

As well as the usual suspects, M is my favourite combinator. T and F help define conditionals, and there is the usual definition of Church numerals. H, obviously, is the factorial function.

Finally, we define some convenience functions:

fun run e =
  case reduce stdenv 0 (pp e) of
    SOME e' => run e'
  | NONE => e;

(* Evaluate first redex *)
fun e0 e = pp (get (reduce stdenv 0 e));
(* Evaluate second redex *)
fun e1 e = pp (get (reduce stdenv 1 e));
(* Evaluate a symbol *)
fun esym s = pp (get (getenv stdenv s));
(* Evaluate to normal form *)
fun e s = pp (eval stdenv (parse s));

And now at last we can do some reductions:

First we probably should check that variable capture is handled properly:

- e "(\\abcd.abcd)xyzw";
xyzw
val it = App (App (App #,Var #),Var "w") : Exp
- e "(\\vxx'x''.vxx'x'')xyzw";
xyzw
val it = App (App (App #,Var #),Var "w") : Exp

It’s interesting that in the first reduction of the latter, there is a sort of cascade of primes being added:

- e0 (parse "(\\vxx'x''.vxx'x'')xyzw");
(λx'x''x'''.xx'x''x''')yzw
val it = App (App (App #,Var #),Var "w") : Exp

Anyway, that seems to check out so we can move on to something more interesting. Let’s derive the Turing fixpoint combinator from M and Y:

- e1(e1(e0(parse"YM")));
(λx.M(xx))(λx.M(xx))
(λxy.y(xxy))(λx.M(xx))
(λxy.y(xxy))(λxy.y(xxy))
val it = App (Lambda ("x",Lambda #),Lambda ("x",Lambda #)) : Exp

Nice. And we can do factorials:

- e "H4";
λfx.f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(fx)))))))))))))))))))))))
val it = Lambda ("f",Lambda ("x",App #)) : Exp

The output is a little big to put inline, but the enthusiastic can try:

run (parse "H2");

See [[here]] for output.


Lambda Interpreter, Part I, Syntax.

Here’s a simple lambda calculus interpreter I wrote a little while ago. It’s in ML, a wonderful language for its polymorphic type inference, pattern matching, and effortless higher order functions.

We start off nice and simple with some abstract syntax:

type Variable = string;

datatype Exp = Var of Variable 
             | App of Exp * Exp 
             | Lambda of Variable * Exp;

We could use de Bruijn indices for variables, making variable substitution easier, but I’m a bit of a traditionalist and it’s not too difficult to do it the hard way.

First job is printing expressions, a nice little recursive scanner will do, and we can sort out bracketing, eliding λx.λy.e to λxy.e and so on with some nifty pattern matching:

fun pexp (Var x) = x
  | pexp (e as Lambda _) = "λ" ^ plambda e
  | pexp (App(e1 as App _,e2)) = pexp e1 ^ pexp1 e2
  | pexp (App(e1,e2)) = pexp1 e1 ^ pexp1 e2
and pexp1 (Var x) = x
  | pexp1 e = "(" ^ pexp e ^ ")"
and plambda(Lambda(v,e)) = v ^ plambda e
  | plambda e = "." ^ pexp e;

fun pp e = (print (pexp e ^ "\n"); e)

Next up, variable substitution, a little tedious, but has to be done. First, we need to know if a variable occurs free in an expression. If it does, we need to find a variable that doesn’t, which we do by decorating with primes. Having got ourselves a variable that isn’t free, we can use it to substitute for the one that is, and that’s all there is to it. The code is probably clearer:

fun isfree c (Var c') = c = c'
  | isfree c (Lambda(c',e)) = 
    if (c = c') then false else isfree c e
  | isfree c (App(e1,e2)) = 
    if isfree c e1 then true else isfree c e2;

fun occurs c (Var c') = c = c'
  | occurs c (Lambda(c',e)) = 
    if (c = c') then true else occurs c e
  | occurs c (App(e1,e2)) = 
    if occurs c e1 then true else occurs c e2;

(* Add primes to variable until we find one not occurring in e *)
fun findnew v e =
    let val v' = v ^ "'"
    in
        if not (occurs v' e) then v'
        else findnew v' e
    end;

fun subst v e1 (e2 as (Var v')) = if v = v' then e1 else e2
  | subst v e1 (App(e2,e3)) = App(subst v e1 e2, subst v e1 e3)
  | subst v e1 (e2 as Lambda(v',e3)) = 
    if not (isfree v e2) then e2 (* Includes case v = v' *)
    else if isfree v' e1 then 
        (* Find a variable not in e1 or e3 to use *)
        let val v'' = findnew v' (App(e1,e3))
        in subst v e1 (Lambda(v'', subst v' (Var v'') e3))
        end
    else Lambda(v', subst v e1 e3);

Phew, glad that’s over. Next, we need to lex and parse expressions. Lexing is straightforward, variables are single letters, we allow either \ or λ for lambda; it seems that we get the UTF-8 bytes for λ one at a time, so that’s just about our only multi character token, apart from primed identifiers (since we use primes to avoid variable capture in substitutions, we want to be able to read them in as well). Lex input is just a list of characters from an exploded string.

datatype LexItem = LAM | BRA | KET | DOT | VAR of string;

fun lex [] t = rev t
  | lex (#" "::s) t = lex s t
  | lex (#"\n"::s) t = lex s t
  | lex (#"\t"::s) t = lex s t
  | lex (#"\\"::s) t = lex s (LAM::t)
  | lex (#"("::s) t = lex s (BRA::t)
  | lex (#")"::s) t = lex s (KET::t)
  | lex (#"."::s) t = lex s (DOT::t)
  | lex (#"\206" :: #"\187" ::s) t = lex s (LAM::t)
  | lex (c::s) t = lexvar s [c] t
and lexvar (#"'"::s) v t = lexvar s (#"'"::v) t
  | lexvar s v t = lex s (VAR (implode(rev v))::t);

Parsing is even more fun. This is a hand-built LR table-driven parser; table driven parsers are good for tricks like semi-intelligent error recovery or doing parse conflict resolution on the fly (useful eg. for languages with redefinable operation precedences like ML). We don’t do anything like that though, we don’t even explicitly detect errors, and instead rely on ML’s pattern matching – if the input is ungrammatical, we get an inexhaustive match error:

fun parse s = 
  let
    (* Items appearing on the parse stack *)
    datatype ParseItem = B | E of Exp | V of Variable;
    fun aux [E e] [] = e
      | aux (E e1::E e2::s) t = aux (E(App(e2,e1))::s) t
      | aux s (LAM::VAR c::DOT::t) = aux (V c::s) t
      | aux s (LAM::VAR c::VAR c'::t) = 
                            aux (V c::s) (LAM::VAR c'::t)
      | aux s (BRA::t) = aux (B::s) t
      | aux ((a as E _):: B :: s) (KET::t) = aux (a::s) t
      | aux s (VAR c::t) = aux(E(Var c)::s) t
      | aux (E e::V v::s) t = aux (E(Lambda(v,e))::s) t;
  in
    aux [] (lex (explode s) [])
  end

Well, that’s it for the moment. Next installment – evaluating expressions.


My Favourite Combinator

A fixpoint operator F is such that:

Ff = f(Ff)

ie. F = λf.f(Ff)

ie. F is a fixpoint of λF.λf.f(Ff)

ie. fixpoints are fixpoints of M = λx.λy.y(xy)

M is my favourite combinator.

[As an aside, let’s do that constructive type theory thing and prove it as a theorem:

x: (A->B)->A
y: A->B
xy: A
y(xy): B
M: ((A->B)->A)->((A->B)->B) (MP applied twice)

Phew, glad that checks out]

Church’s Y operator is:

Y = λf.(λx.f(xx))(λx.f(xx))

[Yf = (λx.f(xx))(λx.f(xx)) [beta]
= f((λx.f(xx))(λx.f(xx))) [beta]
= f(Yf) [subst of equals]]

Apply Y to M:

YM = (λx.M(xx))(λx.M(xx))
= (λx.λy.y(xxy))(λx.λy.y(xxy)) [λx.M(xx) = λx.λy.y(xxy)]
= BB = T

T is the Turing fixpoint combinator. Now isn’t that weird? No doubt there is some deep reason behind this, but what it might be I have no idea. Perhaps it’s down to the well-known fact that Yf needs substitution of equals to show equivalence with f(Yf), whereas T(f) directly reduces to f(Tf):

Tf = (λx.λy.y(xxy))Bf [def]
= f(BBf) [beta]
= f(Tf) [def]

One up for Turing there!