Saturday, May 16, 2015

Beautiful Code for Parsing

I've recently been reading up on JavaScript. One of the books I was re-reading was JavaScript: The Good Parts by Douglas Crockford. In there he mentioned the chapter he wrote in Beautiful Code. (Of which, there are mixed reviews.) It's been a while since I read that book and I didn't remember his chapter. So I pulled out my copy (old enough that it's an actual paper copy). I didn't want to carry the (large) book around so I took advantage of how O'Reilly lets you register your paper books and then buy the ebook for $5. Of course, after I did that I found Crockford's chapter is available on his web site. That's ok, I wouldn't mind re-reading the whole book.

The chapter was on parsing based on Top Down Operator Precedence (TDOP) by Vaughn Pratt. I thought I'd start by reading Pratt's original paper. Crockford's link takes you to the ACM citation which wants to charge you to read the paper (even though it's from 1973), but a quick web search found a public version. I found the paper a little hard to follow.

Crockford's article is an example of a parser for a subset of JavaScript written in that subset. It still took some effort to understand but I found it easier to follow than the original paper.

Suneido's hand written top down recursive descent parser is ok, but it has a lot of methods to parse expressions. It always seemed like there should be a better way. At some point I looked at the Go parser and it manages with a lot less methods by using precedence. I'd be quite interested to see what a Suneido parser would look like using Pratt's approach. Although, because Suneido's syntax grew ad hoc, it has some parts that are a little ugly to parse.

I love coming across elegant new algorithms and ideas. You can tell how much of a geek I am by the fact that I get as much pleasure out of a new algorithm as most people would from a bowl of ice cream :-) Which might help explain why I'm so skinny - not many calories in a delicious algorithm.

No comments: