Folding is the evaluation of constant expressions at compile time. For example:
60 * 60
=> 3600
Propagation is the inlining of variables with constant values. For example:
minutesPerHour = 60
minutes = hours * minutesPerHour
=> minutes = hours * 60
The cSuneido compiler is single pass i.e. it generates byte code while parsing, without generating an intermediate AST. It does constant folding during byte code emission, a bit like a peephole optimizer. For example, if it sees a sequence like Push 5, Push 6, Add it replaces it with Push 11
jSuneido does have an intermediate AST. It does constant folding during code generation. This is integrated into the process. i.e. at the top level it "folds" an AST into a function constant.
gSuneido took yet another approach. It did constant folding during AST generation via a decorator around the AST builder. This has the nice characteristics that it is separate from parsing and code generation, and it avoids building constant expression parts of the AST.
You might wonder how useful constant folding is. It's likely not critical to performance (unless it happens to be in a hot loop). But it actually occurs relatively frequently. For example, long strings:
s = "..." $
"..." $
"..."
Or calculations with units:
seconds = days * 24 * 60 * 60
Knowing that you have constant folding helps avoid people doing e.g. seconds = days * 86400 for performance.
Neither cSuneido nor jSuneido do constant propagation. Again, it's likely not critical to performance. But it would be nice to have, for similar reasons as folding.
For example, if you wrote the above example like this, it would not get folded unless you do constant propagation:
hoursPerDay = 24
minutesPerHour = 60
minutesPerDay = hoursPerDay * minutesPerHour
minutes = days * minutesPerDay
Ideally, you want this to get compiled to simply: minutes = days * 1440
(admittedly somewhat contrived but you get the idea)
When constant folding and constant propagation are done as separate processes, you need to do them iteratively (repeatedly) until there's nothing more to be done. e.g. with the example above, first you'd propagate hoursPerDay and minutesPerHour, then you'd fold the multiplication, and finally you'd propagate minutesPerDay. But if the two are combined, it can be done in a single pass.
I mostly worked on this as a fun break from less interesting work. So I wasn't too concerned if the payoffs weren't huge. It was an interesting problem. But I was concerned about not adding too much overhead. Suneido loads and compiles code on demand, from source. So you want compiling to to be fast. Computers have thankfully gotten much faster over the years, but that free lunch has almost run out. We do have more cores, but I'm not quite ready to do parallel tiered compilation.
It started off quite well. It didn't take much code to get the basics working. I was trying to do it in a single pass (traversal) of the AST. That's easy in the simple cases like the examples above. But then I started running into awkward cases. For example, as soon as you have loops you don't have linear execution. You can't inline a variable when you encounter it because it might get modified later in the loop. Or if a variable you thought was constant is modified, then you can no longer inline it. And you couldn't remove a constant assignment until you determined if it was ever modified. The code got more and more complicated until I finally admitted to myself that this approach didn't make sense.
I thought it would be much simpler if I made two passes. The first pass would find "final" variables (using Java terminology) that are only assigned once and never changed. This makes the next pass quite simple. That made the code much simpler, but the overhead was larger than I liked - about 20% on compiling all of stdlib.
An obvious optimization was to merge the first pass (determining final variables) into the parser. It was nicer to have it separate, but I didn't want the overhead. That brought the overhead down to about 10%.
But I still had two complete traversals over the AST. The new one to do propagation and folding, and another one that determined which blocks needed to be closures. The propagation and folding could probably be merged into the code generation, but it needed to be done before the blocks, and the blocks couldn't be merged into code generation. And neither could be merged into the parsing because of needing the results of the final variables.
It seemed I was stuck, but then I realized that these optimizations weren't applicable to all code. If there were no blocks (which I could easily detect during parsing) then you could skip that pass. And if there were no final variables (also determined during parsing), then you didn't need to do propagation. However, I still wanted to do folding, so I put back the original folding during AST generation (sharing the same code). This is somewhat redundant, but the folding during AST generation adds very little overhead.
That brought the overhead down to a few percent, almost lost in the noise. I was happy with that.
I didn't do any performance measurements. I could easily come up with microbenchmarks showing it was a big benefit. At the same time, I'm also sure it doesn't make a big difference overall. It's nice to know we have it. I like that we can write code more clearly without worrying about performance.
As usual, the code is in Github. The main commits are: