Recurse Center Day 6

August 17, 2020

Kicking off a new week! It’s open source week at RC and I’m both excited at nervous. There are so many projects I’d want to work on, and so many people giving talks about projects. How am I going to make time to work on something? I’ve decided to attend a ton of talks, just to learn, but try to focus on the MirageOS project. My OCaml is…not terrible, so hopefully I can find a small ticket or two and get a PR open.

My Rust VM is going well. I’m up to the tricky part, the Pratt parser. If I understand correctly, the way a Pratt parser works (more or less) is:

Given tokens = a list of tokens
startingPrecedence = 0
rules = table mapping tokentype to prefixFunction, infixFunction, precedenceValue


function parsePrecedence(precedence) =
  while token in tokens
    kick off parsing with precedence argument
      get first token
      Look up the token's prefixFunction in the rules
        that is, it must be prefix because it's the first thing in the list
        if the first token is a + sign, that would be an error
        if that token is a number, for example, that will be valid since numbers can be considered prefix values
      run that prefixFunction (eg, if token is a number, run the compileNumberToken function, if the token is a minus, run the compileUnaryToken function)

      as long as our function precedence is <= the next token's precedence
        consume next token
        get that token's infixFunction from our lookup table
        run infixFunction()

parsePrecedence(0)

note that prefixFunction, infixFunction, etc will ALSO call parsePrecedence with slightly higher precedences - that is, the infixFunction for multiplying might call parsePrecedence(20)

This means too that when when call parsePrecedence(20) for our multiplier it will say something like

go get tokens, and keep generating compiler output as long as those are MORE important than me, since it’s a recursive call
when you’re done, eg when we see an addition, exit and let the calling function continue its lower precedence work

So the workflow for

1 + 2 * 3 + 4

looks like

get token 1
call makeNumber(1)
get precedence 0

look at next token, addition
  call addition infix which enters higher precedence
    get token 2
    call makeNumber(2)
    next token is *, which is higher predecence
      get token 3
      call makeNumber(3)
      next token is +, which is lower precendece - exit
    emit *
  emit +
look at next token, which is +
  call addition infix with which is higher precedence
    capture 4
    call makeNumber(4)
    done
    pop call stack
  emit +
done

If we were emitting opcodes for a stack VM, that might end up looking like

[1, 2, 3, *, +, 4, +]

And that is, in an ugly nutshell, notes on a Pratt Parser.

I work and live in Brooklyn, NY building software.