Recurse Center Day 6
August 17, 2020
Kicking off a new week! It’s open source week at RC and I’m both excited at nervous. There are so many projects I’d want to work on, and so many people giving talks about projects. How am I going to make time to work on something? I’ve decided to attend a ton of talks, just to learn, but try to focus on the MirageOS project. My OCaml is…not terrible, so hopefully I can find a small ticket or two and get a PR open.
My Rust VM is going well. I’m up to the tricky part, the Pratt parser. If I understand correctly, the way a Pratt parser works (more or less) is:
Given tokens = a list of tokens startingPrecedence = 0 rules = table mapping tokentype to prefixFunction, infixFunction, precedenceValue function parsePrecedence(precedence) = while token in tokens kick off parsing with precedence argument get first token Look up the token's prefixFunction in the rules that is, it must be prefix because it's the first thing in the list if the first token is a + sign, that would be an error if that token is a number, for example, that will be valid since numbers can be considered prefix values run that prefixFunction (eg, if token is a number, run the compileNumberToken function, if the token is a minus, run the compileUnaryToken function) as long as our function precedence is <= the next token's precedence consume next token get that token's infixFunction from our lookup table run infixFunction() parsePrecedence(0)
note that prefixFunction, infixFunction, etc will ALSO call parsePrecedence with slightly higher precedences - that is, the infixFunction for multiplying might call parsePrecedence(20)
This means too that when when call parsePrecedence(20) for our multiplier it will say something like
- go get tokens, and keep generating compiler output as long as those are MORE important than me, since it’s a recursive call
- when you’re done, eg when we see an addition, exit and let the calling function continue its lower precedence work
So the workflow for
1 + 2 * 3 + 4
get token 1 call makeNumber(1) get precedence 0 look at next token, addition call addition infix which enters higher precedence get token 2 call makeNumber(2) next token is *, which is higher predecence get token 3 call makeNumber(3) next token is +, which is lower precendece - exit emit * emit + look at next token, which is + call addition infix with which is higher precedence capture 4 call makeNumber(4) done pop call stack emit + done
If we were emitting opcodes for a stack VM, that might end up looking like
[1, 2, 3, *, +, 4, +]
And that is, in an ugly nutshell, notes on a Pratt Parser.
I work and live in Brooklyn, NY building software.