Friday, August 13, 2010

Connecting the Dots

Sometimes I'm a little slow :-( 

One of the talks at the JVM conference was about a JVM for mobile devices. It used a single pass compiler with no intermediate representation. Suneido also does this but there are some issues with it. Suneido takes the obvious approach of generating a push when it encounters an operand, and an operation when it encounters an operator (after processing its operands). This works great with simple expressions like "a + b" which compiles to "push a, push b, add". But it falls down for more complex things, even for "a = b" since you don't want to push the value of "a". I work around these issues but the code is ugly. And worse, a lot of the ugliness is in the parser, even though the issue is really with code generation.

I wasn't happy with it but, consciously or not, I assumed it was essential rather than accidental complexity. So I didn't look hard enough for a better solution. . 

So I was curious to hear more about their approach. The topic only got one bullet point, something like "code generation by abstract interpretation". 

I started thinking about what that meant. I'm pretty sure what they meant is that you only generate code when you encounter an operator. When you encounter an operand like a literal or variable, you push it on a compile time stack, "simulating" execution. Then you can defer generation of code until you know what is needed.

As I thought about this, I realized I knew about this technique, I'd read about it before. But for some reason I never connected it to my code generation issues.

So when I got home from the conferences I spent three days refactoring the code generation in jSuneido. It wasn't quite as straightforward as I imagined (it never is!). The parser code is definitely cleaner but I had to add more code to the code generator than I would have liked. Overall, I think it's better. I broke a lot of tests in the process, but I finally got them all passing again last night. 

Now I can quite easily add a number of optimizations that I've been wanting to do:
- call private methods directly (no method lookup required)
- call built-in global functions/classes directly

However, there are still things I'd like to do that are difficult with a single pass compile. For example, knowing which local variables are read or written by closures. One way to handle this would be to re-compile if the assumptions were wrong, using information from the first try to do it correctly the second time. But that seems ugly. Maybe I need to drop the single pass compile and just generate an AST.