Tuesday, October 02, 2018

Precedence Climbing Expression Parsing

Both cSuneido (original C++ implementation) and jSuneido (newer Java implementation) use classic recursive descent parsers. And that's what I implemented once again with gSuneido (in progress Go implementation).

But classic recursive descent is ugly for expressions, especially when you have lots of levels of precedence (because Suneido followed the C/C++ pattern). The code is verbose because there is a function for every level of precedence. And it's slow because during parsing you have to call through all those functions. For example, an expression consisting of a single identifier requires passing through 15 levels of precedence.

I've been aware of top down operator parsing (TDOP) aka Pratt parsing for a while (see my previous blog post). And more recently I came across precedence climbing which is basically a simplified version of TDOP.  I started and one of my programmers finished (thanks Hao!) a TDOP parser for the Suneido language, written in the Suneido language. We use it, for example, for search and replace by expression.

But I had always found TDOP somewhat hard to follow. The original Pratt paper is not the clearest. Crockford's version is better but still uses Pratt's cryptic "nud" (null denotation) and "led" (left denotation) terminology.

Recently I read Writing An Interpreter In Go and Writing A Compiler In Go by Thorsten Ball. (recommended, although I found myself skipping over the masses of test code) They use a TDOP parser, with the less opaque terminology of "prefix" instead of "nud" and "infix" instead of "led".

The main advantage of TDOP is that it is modular and easily extendible. But I had a more or less fixed target. So I decided to try rewriting the gSuneido expression parser using precedence climbing instead. (gSuneido is a good place to try out ideas because it's not in production and I don't have to worry about breaking anything.)

The basics are easy. It took a bit of experimentation to get things like assignment operators and ?: working but it wasn't too bad.

One thing that puzzles me is that assignment is normally thought of as the lowest precedence. But I had to make it the highest precedence in order to get things like "false is x = f()" to work. In my old classic recursive descent implementation assignment was high precedence as well.

With my classic recursive descent, it was hard coded into the parser that only certain expressions (lvalues e.g. x or v[0] or a.b.c) were allowed on the left of an assignment (or ++ or --). But that wasn't the case with this version so I had to add checking for that.

The end result (initial version or latest) is smaller than the old code, and runs quite a bit faster. In some ways it's not as clear, for example, figuring out how the minimum precedence argument works. But in other ways it's clearer, for example, there's an explicit precedence table (instead of the order of recursive calls). However, I'm not sure if it's enough better to warrant going back and changing the cSuneido or jSuneido versions.

The parallels between precedence climbing and TDOP are easy to see in the code. The pcExpr method has a switch with the led/infix handling, and the atom method has a switch with the nud/prefix handling. It would be easy to refactor to a table driven style (like Pratt) or an object oriented style (like Crockford). I didn't do this because it would have made the code slower (due to the indirect function calls) and larger.

References:
Top Down Operator Precedence, Vaughan Pratt
Top Down Operator Precedence, Douglas Crockford
Parsing Expressions by Recursive Descent, Theodore Norvell
From Precedence Climbing to Pratt Parsing, Theodore Norvell
Parsing expressions by precedence climbing, Eli Bendersky
Pratt Parsers: Expression Parsing Made Easy, Bob Nystrom