Recurse Center Day 6
August 17, 2020
Kicking off a new week! It’s open source week at RC and I’m both excited at nervous. There are so many projects I’d want to work on, and so many people giving talks about projects. How am I going to make time to work on something? I’ve decided to attend a ton of talks, just to learn, but try to focus on the MirageOS project. My OCaml is…not terrible, so hopefully I can find a small ticket or two and get a PR open.
My Rust VM is going well. I’m up to the tricky part, the Pratt parser. If I understand correctly, the way a Pratt parser works (more or less) is:
Given tokens = a list of tokens
startingPrecedence = 0
rules = table mapping tokentype to prefixFunction, infixFunction, precedenceValue
function parsePrecedence(precedence) =
while token in tokens
kick off parsing with precedence argument
get first token
Look up the token's prefixFunction in the rules
that is, it must be prefix because it's the first thing in the list
if the first token is a + sign, that would be an error
if that token is a number, for example, that will be valid since numbers can be considered prefix values
run that prefixFunction (eg, if token is a number, run the compileNumberToken function, if the token is a minus, run the compileUnaryToken function)
as long as our function precedence is <= the next token's precedence
consume next token
get that token's infixFunction from our lookup table
run infixFunction()
parsePrecedence(0)
note that prefixFunction, infixFunction, etc will ALSO call parsePrecedence with slightly higher precedences - that is, the infixFunction for multiplying might call parsePrecedence(20)
This means too that when when call parsePrecedence(20) for our multiplier it will say something like
- go get tokens, and keep generating compiler output as long as those are MORE important than me, since it’s a recursive call
- when you’re done, eg when we see an addition, exit and let the calling function continue its lower precedence work
So the workflow for
1 + 2 * 3 + 4
looks like
get token 1
call makeNumber(1)
get precedence 0
look at next token, addition
call addition infix which enters higher precedence
get token 2
call makeNumber(2)
next token is *, which is higher predecence
get token 3
call makeNumber(3)
next token is +, which is lower precendece - exit
emit *
emit +
look at next token, which is +
call addition infix with which is higher precedence
capture 4
call makeNumber(4)
done
pop call stack
emit +
done
If we were emitting opcodes for a stack VM, that might end up looking like
[1, 2, 3, *, +, 4, +]
And that is, in an ugly nutshell, notes on a Pratt Parser.
I work and live in Brooklyn, NY building software.