matthew arcus

Lambda Interpreter, Part I, Syntax.

Here’s a simple lambda calculus interpreter I wrote a little while ago. It’s in ML, a wonderful language for its polymorphic type inference, pattern matching, and effortless higher order functions.

We start off nice and simple with some abstract syntax:

type Variable = string;

datatype Exp = Var of Variable 
             | App of Exp * Exp 
             | Lambda of Variable * Exp;

We could use de Bruijn indices for variables, making variable substitution easier, but I’m a bit of a traditionalist and it’s not too difficult to do it the hard way.

First job is printing expressions, a nice little recursive scanner will do, and we can sort out bracketing, eliding λx.λy.e to λxy.e and so on with some nifty pattern matching:

fun pexp (Var x) = x
  | pexp (e as Lambda _) = "λ" ^ plambda e
  | pexp (App(e1 as App _,e2)) = pexp e1 ^ pexp1 e2
  | pexp (App(e1,e2)) = pexp1 e1 ^ pexp1 e2
and pexp1 (Var x) = x
  | pexp1 e = "(" ^ pexp e ^ ")"
and plambda(Lambda(v,e)) = v ^ plambda e
  | plambda e = "." ^ pexp e;

fun pp e = (print (pexp e ^ "\n"); e)

Next up, variable substitution, a little tedious, but has to be done. First, we need to know if a variable occurs free in an expression. If it does, we need to find a variable that doesn’t, which we do by decorating with primes. Having got ourselves a variable that isn’t free, we can use it to substitute for the one that is, and that’s all there is to it. The code is probably clearer:

fun isfree c (Var c') = c = c'
  | isfree c (Lambda(c',e)) = 
    if (c = c') then false else isfree c e
  | isfree c (App(e1,e2)) = 
    if isfree c e1 then true else isfree c e2;

fun occurs c (Var c') = c = c'
  | occurs c (Lambda(c',e)) = 
    if (c = c') then true else occurs c e
  | occurs c (App(e1,e2)) = 
    if occurs c e1 then true else occurs c e2;

(* Add primes to variable until we find one not occurring in e *)
fun findnew v e =
    let val v' = v ^ "'"
        if not (occurs v' e) then v'
        else findnew v' e

fun subst v e1 (e2 as (Var v')) = if v = v' then e1 else e2
  | subst v e1 (App(e2,e3)) = App(subst v e1 e2, subst v e1 e3)
  | subst v e1 (e2 as Lambda(v',e3)) = 
    if not (isfree v e2) then e2 (* Includes case v = v' *)
    else if isfree v' e1 then 
        (* Find a variable not in e1 or e3 to use *)
        let val v'' = findnew v' (App(e1,e3))
        in subst v e1 (Lambda(v'', subst v' (Var v'') e3))
    else Lambda(v', subst v e1 e3);

Phew, glad that’s over. Next, we need to lex and parse expressions. Lexing is straightforward, variables are single letters, we allow either \ or λ for lambda; it seems that we get the UTF-8 bytes for λ one at a time, so that’s just about our only multi character token, apart from primed identifiers (since we use primes to avoid variable capture in substitutions, we want to be able to read them in as well). Lex input is just a list of characters from an exploded string.

datatype LexItem = LAM | BRA | KET | DOT | VAR of string;

fun lex [] t = rev t
  | lex (#" "::s) t = lex s t
  | lex (#"\n"::s) t = lex s t
  | lex (#"\t"::s) t = lex s t
  | lex (#"\\"::s) t = lex s (LAM::t)
  | lex (#"("::s) t = lex s (BRA::t)
  | lex (#")"::s) t = lex s (KET::t)
  | lex (#"."::s) t = lex s (DOT::t)
  | lex (#"\206" :: #"\187" ::s) t = lex s (LAM::t)
  | lex (c::s) t = lexvar s [c] t
and lexvar (#"'"::s) v t = lexvar s (#"'"::v) t
  | lexvar s v t = lex s (VAR (implode(rev v))::t);

Parsing is even more fun. This is a hand-built LR table-driven parser; table driven parsers are good for tricks like semi-intelligent error recovery or doing parse conflict resolution on the fly (useful eg. for languages with redefinable operation precedences like ML). We don’t do anything like that though, we don’t even explicitly detect errors, and instead rely on ML’s pattern matching – if the input is ungrammatical, we get an inexhaustive match error:

fun parse s = 
    (* Items appearing on the parse stack *)
    datatype ParseItem = B | E of Exp | V of Variable;
    fun aux [E e] [] = e
      | aux (E e1::E e2::s) t = aux (E(App(e2,e1))::s) t
      | aux s (LAM::VAR c::DOT::t) = aux (V c::s) t
      | aux s (LAM::VAR c::VAR c'::t) = 
                            aux (V c::s) (LAM::VAR c'::t)
      | aux s (BRA::t) = aux (B::s) t
      | aux ((a as E _):: B :: s) (KET::t) = aux (a::s) t
      | aux s (VAR c::t) = aux(E(Var c)::s) t
      | aux (E e::V v::s) t = aux (E(Lambda(v,e))::s) t;
    aux [] (lex (explode s) [])

Well, that’s it for the moment. Next installment – evaluating expressions.