Tuesday, January 20, 2009

jSuneido Hung up on Antlr

Views were one of the last things to implement in the straight database server part of jSuneido.

This meant extending the Antlr grammars for requests (to define views) and queries (to expand views). I thought defining views would be easy and expanding them would be tricky. It turned out to be the opposite.

The trick with defining views is that I didn't want to parse the view definition, especially since the view definition follows the query grammar, not the request grammar. I just wanted to slurp up the remainder of the input as the definition. This sounds simple, but I thrashed for 3 or 4 hours on it.

I had something like:
request : VIEW id '=' VIEWDEF ;

VIEWDEF : .* ;
But you can't do that. What makes it trickier is that although this looks like a single grammar it actually produces separate lexer and parser classes. (lower case rules are for parsing, upper case for lexing.) The lexer operates independently so you can't define VIEWDEF as .* since that'll match anything/everything.

I tried predicates based on some example I found for implementing C macros:
request : VIEW id '=' {viewdef = true; } VIEWDEF ;

VIEWDEF : {viewdef}? .* ;
But this doesn't work either, probably for a variety of reasons, but one main one is that the parser and lexer are separate so the parser can't control the lexer (at least not easily).

So I decided to do the work all in the lexer:
request : VIEWDEF ;

VIEWDEF : VIEW ID '=' .* ;
But then I got "alternatives are unreachable". Eventually I found an explanation that .* is not greedy. That works well for something like '/*' .* '*/' but not so well for what I wanted. The suggested solution was to use ('\u0000'..'\uFFFE')* instead. Here's the end result:
request : VIEWDEF
{
String s = $VIEWDEF.text;
int i = s.indexOf('=');
iRequest.view(s.substring(0, i), s.substring(i + 1).trim());
}

VIEWDEF : VIEW WHITE+ NAME WHITE* '=' ANY EOF
{ setText($NAME.text + "=" + $ANY.text); };

fragment
ANY : ('\u0000'..'\uFFFE')+ ;
This seems a little ugly, but appears to work. It's a little roundabout to lexically split the view definition, then concatenate it back together, only to split it apart in the parser.

There could be a better way to do this but I couldn't find it. This will do for now.

I did find a bug in the standard library code while I was working on the views. The QueryCount function was trying to optimize the case where the query was just a table name. But something that looks like just a table name could actually be a view, in which case the optimization wouldn't work. I removed the optimization. It should really have been in the database anyway.

I thought this would fix the last of the test failures, but when I ran them all I found there were still a couple more failing. Hopefully one more day will clean them up.

As powerful as Antlr is, it's not the easiest system to work with. One of the next things I want to work on is compiling the Suneido language. I have a feeling I'll be putting in many more hours getting Antlr to do what I want. I could always go back to porting the hand written C++ recursive descent parser, but I think Antlr is better if it's not too painful to get working.

No comments: