Friday, October 29, 2010

jSuneido Compiler Overhaul Part 2

After my last post I was thinking more and more about switching from single pass compiling to compiling via an Abstract Syntax Tree (AST).

I wasn't really intending to work on this, but I had a bit of time and I thought I'd just investigate it. Of course, then I got sucked into it and 10 days later I have the changes done. I'm pretty happy with the results. A big part of getting sucked into it was that it became even more obvious that there were a lot of advantages.

Previously I had lexer, parser, and code generator, with the parser directly driving the code generator. Now I have lexer, parser, AST generator, and code generator.

I'm very glad that from the beginning I had separated the parser from the actions. The parser calls methods on an object that implements an interface. This seems obvious, but too many systems like YACC/Bison and ANTLR encourage you to mix the actions in with the grammar rules. And doing this does make it easier to at first. But it is not very flexible. Because I'd kept them separate, I could implement AST generation without touching the parser. Although I haven't used it, I understand Tatoo splits the parser and the actions. (Note: Although I initially tried using ANTLR for jSuneido, the current parser, like the cSuneido one, is a hand written recursive descent parser.)

While I was at it, I also split off the parts of the code generator that interface with ASM to generate the actual byte code. The code generator is still JVM byte code specific, but at a little higher level. And now it would be easy to use an alternative way to generator byte code, instead of ASM.

When the parser was directly driving the code generation, I needed quite a few intermediate actions that were triggered part way through parsing a grammar rule. These were no longer needed and removing them cleaned up the parser quite a bit.

Having the entire AST available for code generation is a lot simpler than trying to generate code as you parse. It removed a lot of complexity from the code generation (including removing all the clever deferred processing with a simulated stack). And it let me implement a number of improvements that weren't feasible with the single pass approach.

Despite adding a new stage to the compile process, because it was so much simpler I'm pretty sure I actually ended up with less code than before. That's always a nice result!

The only downside is that compiling may be slightly slower, although I suspect the difference will be minimal. Code is compiled on the fly, but the compiled code is kept in memory, so it only really affects startup, which is not that critical for a long running server.

In retrospect, I should have taken the AST approach with jSuneido right from the start. But at the time, I was trying to focus on porting the C++ code, and fighting the constant urge to redesign everything.


Roberto said...

Greetings Andrew -

Very nice. It is good to see that you continue to improve on the original ideas that lead you to build Suneido in the first place.

Is there any time frame for the completion of the jSuneido Compiler implementation?

Roberto Artigas

andrew said...

Hi Roberto,

Good to hear from you. The server portion of jSuneido is functional. We have some demo systems running on it with no problems. Next we are going to switch our own internal accounting/CRM over to jSuneido. That should be a decent test since we have about 50 users in the company. I wouldn't be surprised to find some bugs along the way.