Lambda Interpreter, Part II, Semantics

First thing, we need to be able to define identifiers, at least at the top level, for S, K, I, etc. A simple list of string-value pairs will do, we will fill in our default environment later:

fun getenv [] v = NONE
  | getenv ((v',e)::s) v =
    if (v = v') then SOME e else getenv s v;

(* Make an environment from a list of strings *)
fun mkenv [] k = k
  | mkenv (v::e::s) k = mkenv s ((v,parse e)::k)

We will define two evaluation mechanisms, one will reduce one redex at a time, with the index of the redex passed in to the function, the other will just reduce until a normal form is reached, if there is one.

It’s convenient to use option types to indicate if a subexpression has been reduced, and it’s even more convenient to define some monadic-style helper functions:

fun lift f NONE = NONE
  | lift f (SOME x) = SOME (f x);

fun bind f NONE = NONE
  | bind f (SOME a) = f a;

(* Apply f to a, if no result, apply g to b *)
fun try f a g b = case f a of r as SOME _ => r | NONE => g b;

fun get (SOME x) = x;

Here’s the single redex reducer: try to reduce each subexpression, keeping track of redex indexes. Return SOME(e') if a sub evaluation succeeds, else return NONE with some monadic trickery to handle the mechanics. We avoid expanding global identifiers unless they are in functional position.

fun reduce env n e =
    fun aux n (Var _) = NONE
      | aux 0 (App(Lambda(v,e1),e2)) = SOME (subst v e2 e1)
      | aux n (App(Lambda(v,e1),e2)) =
	try (lift (fn e1' => App(Lambda(v,e1'),e2)) o aux (n-1)) e1
	    (lift (fn e2' => App(Lambda(v,e1),e2')) o aux (n-1)) e2
      | aux n (App(e1 as Var v,e2)) =
	try (bind (fn e1' => aux n (App(e1',e2))) o getenv env) v
	    (lift (fn e2' => App(e1,e2')) o aux n) e2
      | aux n (App(e1,e2)) =
	try (lift (fn e1' => App(e1',e2)) o aux n) e1
	    (lift (fn e2' => App(e1,e2')) o aux n) e2
      | aux n (Lambda(v,e1)) =
	(lift (fn e1' => Lambda(v,e1')) o aux n) e1
      aux n e

That’s all very well for reducing individual redexes, let’s define something that will just let rip on an expression. We’d like to do a proper normal order evaluation, so we’ll use an evaluation stack: go down left branch of expression, reducing beta redexes as we go. When we have got to the end, there are no more top-level redexes, so recursively evaluate the items on the stack, and finally fold back up in into a single expression:

fun eval env e = 
    fun foldapp(e1::e2::s) = foldapp(App(e1,e2)::s)
      | foldapp ([e1]) = e1;
    fun aux (Lambda(v,e1)) (e2::s) = aux (subst v e2 e1) s
      | aux (Lambda(v,e1)) [] = Lambda(v, eval env e1)
      | aux (App(e1,e2)) s = aux e1 (e2::s)
      | aux (e as Var _) [] = e
      | aux (e as Var v) s = 
        (case getenv env v of
             SOME e' => aux e' s
           | _ => foldapp (map (eval env) (e::s)));
    aux e []

All we need now is a suitable environment, and we can start reducing.

val stdenv =
    mkenv ["S", "λxyz.(xz)(yz)",
	   "K", "λxy.x",
	   "I", "λx.x",
	   "Y", "λf.(λx.f(xx))(λx.f(xx))",
	   "M", "λxy.y(xy)",
	   "T", "λxy.x",
	   "F", "λxy.y",
	   "Z", "λn.n(λx.F)T",
	   "0", "λf.λn.n",
	   "N", "λn.λf.λx.f(nfx)",
	   "P", "λnfx.n(λgh.h(gf))(λu.x)(λu.u)",
	   "*", "λmnfx.m(nf)x",
	   "+", "λ",
	   "1", "N0",
	   "2", "N1",
	   "3", "N2",
	   "4", "N3",
	   "5", "N4",
	   "6", "N5",
	   "7", "N6",
	   "8", "N7",
	   "9", "N8",
	   "H", "Y(λgn.(Zn)1(*n(g(Pn))))"
	  ] [];

As well as the usual suspects, M is my favourite combinator. T and F help define conditionals, and there is the usual definition of Church numerals. H, obviously, is the factorial function.

Finally, we define some convenience functions:

fun run e =
  case reduce stdenv 0 (pp e) of
    SOME e' => run e'
  | NONE => e;

(* Evaluate first redex *)
fun e0 e = pp (get (reduce stdenv 0 e));
(* Evaluate second redex *)
fun e1 e = pp (get (reduce stdenv 1 e));
(* Evaluate a symbol *)
fun esym s = pp (get (getenv stdenv s));
(* Evaluate to normal form *)
fun e s = pp (eval stdenv (parse s));

And now at last we can do some reductions:

First we probably should check that variable capture is handled properly:

- e "(\\abcd.abcd)xyzw";
val it = App (App (App #,Var #),Var "w") : Exp
- e "(\\vxx'x''.vxx'x'')xyzw";
val it = App (App (App #,Var #),Var "w") : Exp

It’s interesting that in the first reduction of the latter, there is a sort of cascade of primes being added:

- e0 (parse "(\\vxx'x''.vxx'x'')xyzw");
val it = App (App (App #,Var #),Var "w") : Exp

Anyway, that seems to check out so we can move on to something more interesting. Let’s derive the Turing fixpoint combinator from M and Y:

- e1(e1(e0(parse"YM")));
val it = App (Lambda ("x",Lambda #),Lambda ("x",Lambda #)) : Exp

Nice. And we can do factorials:

- e "H4";
val it = Lambda ("f",Lambda ("x",App #)) : Exp

The output is a little big to put inline, but the enthusiastic can try:

run (parse "H2");

See [[here]] for output.

Lambda Interpreter, Part I, Syntax.

Here’s a simple lambda calculus interpreter I wrote a little while ago. It’s in ML, a wonderful language for its polymorphic type inference, pattern matching, and effortless higher order functions.

We start off nice and simple with some abstract syntax:

type Variable = string;

datatype Exp = Var of Variable 
             | App of Exp * Exp 
             | Lambda of Variable * Exp;

We could use de Bruijn indices for variables, making variable substitution easier, but I’m a bit of a traditionalist and it’s not too difficult to do it the hard way.

First job is printing expressions, a nice little recursive scanner will do, and we can sort out bracketing, eliding λx.λy.e to λxy.e and so on with some nifty pattern matching:

fun pexp (Var x) = x
  | pexp (e as Lambda _) = "λ" ^ plambda e
  | pexp (App(e1 as App _,e2)) = pexp e1 ^ pexp1 e2
  | pexp (App(e1,e2)) = pexp1 e1 ^ pexp1 e2
and pexp1 (Var x) = x
  | pexp1 e = "(" ^ pexp e ^ ")"
and plambda(Lambda(v,e)) = v ^ plambda e
  | plambda e = "." ^ pexp e;

fun pp e = (print (pexp e ^ "\n"); e)

Next up, variable substitution, a little tedious, but has to be done. First, we need to know if a variable occurs free in an expression. If it does, we need to find a variable that doesn’t, which we do by decorating with primes. Having got ourselves a variable that isn’t free, we can use it to substitute for the one that is, and that’s all there is to it. The code is probably clearer:

fun isfree c (Var c') = c = c'
  | isfree c (Lambda(c',e)) = 
    if (c = c') then false else isfree c e
  | isfree c (App(e1,e2)) = 
    if isfree c e1 then true else isfree c e2;

fun occurs c (Var c') = c = c'
  | occurs c (Lambda(c',e)) = 
    if (c = c') then true else occurs c e
  | occurs c (App(e1,e2)) = 
    if occurs c e1 then true else occurs c e2;

(* Add primes to variable until we find one not occurring in e *)
fun findnew v e =
    let val v' = v ^ "'"
        if not (occurs v' e) then v'
        else findnew v' e

fun subst v e1 (e2 as (Var v')) = if v = v' then e1 else e2
  | subst v e1 (App(e2,e3)) = App(subst v e1 e2, subst v e1 e3)
  | subst v e1 (e2 as Lambda(v',e3)) = 
    if not (isfree v e2) then e2 (* Includes case v = v' *)
    else if isfree v' e1 then 
        (* Find a variable not in e1 or e3 to use *)
        let val v'' = findnew v' (App(e1,e3))
        in subst v e1 (Lambda(v'', subst v' (Var v'') e3))
    else Lambda(v', subst v e1 e3);

Phew, glad that’s over. Next, we need to lex and parse expressions. Lexing is straightforward, variables are single letters, we allow either \ or λ for lambda; it seems that we get the UTF-8 bytes for λ one at a time, so that’s just about our only multi character token, apart from primed identifiers (since we use primes to avoid variable capture in substitutions, we want to be able to read them in as well). Lex input is just a list of characters from an exploded string.

datatype LexItem = LAM | BRA | KET | DOT | VAR of string;

fun lex [] t = rev t
  | lex (#" "::s) t = lex s t
  | lex (#"\n"::s) t = lex s t
  | lex (#"\t"::s) t = lex s t
  | lex (#"\\"::s) t = lex s (LAM::t)
  | lex (#"("::s) t = lex s (BRA::t)
  | lex (#")"::s) t = lex s (KET::t)
  | lex (#"."::s) t = lex s (DOT::t)
  | lex (#"\206" :: #"\187" ::s) t = lex s (LAM::t)
  | lex (c::s) t = lexvar s [c] t
and lexvar (#"'"::s) v t = lexvar s (#"'"::v) t
  | lexvar s v t = lex s (VAR (implode(rev v))::t);

Parsing is even more fun. This is a hand-built LR table-driven parser; table driven parsers are good for tricks like semi-intelligent error recovery or doing parse conflict resolution on the fly (useful eg. for languages with redefinable operation precedences like ML). We don’t do anything like that though, we don’t even explicitly detect errors, and instead rely on ML’s pattern matching – if the input is ungrammatical, we get an inexhaustive match error:

fun parse s = 
    (* Items appearing on the parse stack *)
    datatype ParseItem = B | E of Exp | V of Variable;
    fun aux [E e] [] = e
      | aux (E e1::E e2::s) t = aux (E(App(e2,e1))::s) t
      | aux s (LAM::VAR c::DOT::t) = aux (V c::s) t
      | aux s (LAM::VAR c::VAR c'::t) = 
                            aux (V c::s) (LAM::VAR c'::t)
      | aux s (BRA::t) = aux (B::s) t
      | aux ((a as E _):: B :: s) (KET::t) = aux (a::s) t
      | aux s (VAR c::t) = aux(E(Var c)::s) t
      | aux (E e::V v::s) t = aux (E(Lambda(v,e))::s) t;
    aux [] (lex (explode s) [])

Well, that’s it for the moment. Next installment – evaluating expressions.