Crafting Interpreters in OCaml - Preface

May 15, 2019

I’ll be writing a series of posts documenting my goals of learning OCaml as well as understanding how interpreters and compilers work. The motivation here is twofold.

I want to challenge myself and step out of the comfort of usual web development and all the ease of the JS ecosystem
I want to better understand how programming languages actually work.

To learn OCaml, I’ve worked through a Cornell class’s online notes ¹ as well as read much of version 1 of Real World OCaml. When version 2 comes out, I’ll try to update these notes. I’ve also been devouring YouTube videos from Jane Street as well as spending a lot of time in the ReasonML / OCaml Discord

To understand more about interpreters, I’ll be using two primary resources.

Crafting Interpreters² by Bob Nystrom
Writing an Interpreter in Go³ by Thorsten Ball

By the time this series is done, I hope to have written an interpreter for the Lox language used in Crafting Interpreters using OCaml. I’ll be establishing a few rules in order to throw myself into the deep end.

Where possible, I will use the OCaml data type system to help ensure exhaustive checking of various states. Variant types are a powerful tool available in OCaml (and Haskell, F# and many other ML-inspired langauges) that can serve as a way of flagging different cases. For example, a chunk of code can be a string, a number, or some keyword. Using these types, I could state that a code chunk is

type Chunk =
| Keyword
| String of string
| Number of int

Where possible, I’ll avoid mutation, classes and global state. In chapter 1, Bob uses a Scanner class with fields of start and current, both ints, to track lexeme parsing (aka just parsing out phrases). Rather than this, I’ll be writing a Scanner module and passing around a context object between functions.
Where possible, I’ll use recursion over looping. In the book, there are times Bob uses while loops to pick out phrases. For example, continue parsing out a chunk of code while the next character is not a space, then stop. That code might look something like (in JS).

  let start = 0;
  let current = 0;
  let phrase = "";
  const testString = "Hello world!";

  while (testString[current] != " ") {
    phrase = phrase + testString[current];
    current++;
  }

  // phrase = "Hello"

Rather than a while loop, I’ll be using pattern matching and recursion. In OCaml, that might look something like the following.

  type context = {
    start: int;
    current: int;
    phrase: string;
    source: string;
  }

  let ctx = {
    start = 0;
    current = 0;
    phrase = "";
    source = "Hello world!";
  }

  let rec capture_until_space context =
    let next_char = (String.get context.source context.current) in
    match next_char with
    | ' ' -> context
    | _ -> capture_until_space {
        context with current = context.current + 1;
                     phrase = (context.phrase ^ Char.escaped next_char);
      }

  (*
    # capture_until_space ctx;;
    - : context = {start = 0; current = 5; phrase = "Hello"; source = "Hello world!"}
  *)

If I succeed in doing this, I can maintain purity.

I’ll be also attempting to write unit tests along the way using OUnit2. I’ll also attempt to implement a setup with Dune to make it easier to build and follow along. I know the Reason community has tools like Esy as well. If I’m ambitious, I’ll use that as well to get everything set up.

Wish me luck!

CS3110 just ran in the Spring of 2019 and notes are here
↩
Bob’s book is a work in progress and can be found here
↩
Thorsten’s book will be used to supplement my understanding, and can be found here
↩

I work and live in Brooklyn, NY building software.

Crafting Interpreters in OCaml - Setting up your environment →