Friday, April 17, 2020

Constant Propagation and Folding

Constant folding and constant propagation are related compiler optimizations.


Folding is the evaluation of constant expressions at compile time. For example:

60 * 60

=> 3600

Propagation is the inlining of variables with constant values. For example:

minutesPerHour = 60
minutes = hours * minutesPerHour

=> minutes = hours * 60

The cSuneido compiler is single pass i.e. it generates byte code while parsing, without generating an intermediate AST. It does constant folding during byte code emission, a bit like a peephole optimizer. For example, if it sees a sequence like Push 5, Push 6, Add it replaces it with Push 11

jSuneido does have an intermediate AST. It does constant folding during code generation. This is integrated into the process. i.e. at the top level it "folds" an AST into a function constant.

gSuneido took yet another approach. It did constant folding during AST generation via a decorator around the AST builder. This has the nice characteristics that it is separate from parsing and code generation, and it avoids building constant expression parts of the AST.

You might wonder how useful constant folding is. It's likely not critical to performance (unless it happens to be in a hot loop). But it actually occurs relatively frequently. For example, long strings:

s = "..." $
    "..." $
    "..."

Or calculations with units:

seconds = days * 24 * 60 * 60

Knowing that you have constant folding helps avoid people doing e.g. seconds = days * 86400 for performance.

Neither cSuneido nor jSuneido do constant propagation. Again, it's likely not critical to performance. But it would be nice to have, for similar reasons as folding.

For example, if you wrote the above example like this, it would not get folded unless you do constant propagation:

hoursPerDay = 24
minutesPerHour = 60
minutesPerDay = hoursPerDay * minutesPerHour
minutes = days * minutesPerDay

Ideally, you want this to get compiled to simply: minutes = days * 1440
(admittedly somewhat contrived but you get the idea)

When constant folding and constant propagation are done as separate processes, you need to do them iteratively (repeatedly) until there's nothing more to be done. e.g. with the example above, first you'd propagate hoursPerDay and minutesPerHour, then you'd fold the multiplication, and finally you'd propagate minutesPerDay. But if the two are combined, it can be done in a single pass.

I mostly worked on this as a fun break from less interesting work. So I wasn't too concerned if the payoffs weren't huge. It was an interesting problem. But I was concerned about not adding too much overhead. Suneido loads and compiles code on demand, from source. So you want compiling to to be fast. Computers have thankfully gotten much faster over the years, but that free lunch has almost run out. We do have more cores, but I'm not quite ready to do parallel tiered compilation.

It started off quite well. It didn't take much code to get the basics working. I was trying to do it in a single pass (traversal) of the AST. That's easy in the simple cases like the examples above. But then I started running into awkward cases. For example, as soon as you have loops you don't have linear execution. You can't inline a variable when you encounter it because it might get modified later in the loop. Or if a variable you thought was constant is modified, then you can no longer inline it. And you couldn't remove a constant assignment until you determined if it was ever modified. The code got more and more complicated until I finally admitted to myself that this approach didn't make sense.

I thought it would be much simpler if I made two passes. The first pass would find "final" variables (using Java terminology) that are only assigned once and never changed. This makes the next pass quite simple. That made the code much simpler, but the overhead was larger than I liked - about 20% on compiling all of stdlib.

An obvious optimization was to merge the first pass (determining final variables) into the parser. It was nicer to have it separate, but I didn't want the overhead. That brought the overhead down to about 10%.

But I still had two complete traversals over the AST. The new one to do propagation and folding, and another one that determined which blocks needed to be closures. The propagation and folding could probably be merged into the code generation, but it needed to be done before the blocks, and the blocks couldn't be merged into code generation. And neither could be merged into the parsing because of needing the results of the final variables.

It seemed I was stuck, but then I realized that these optimizations weren't applicable to all code. If there were no blocks (which I could easily detect during parsing) then you could skip that pass. And if there were no final variables (also determined during parsing), then you didn't need to do propagation. However, I still wanted to do folding, so I put back the original folding during AST generation (sharing the same code). This is somewhat redundant, but the folding during AST generation adds very little overhead.

That brought the overhead down to a few percent, almost lost in the noise. I was happy with that.

I didn't do any performance measurements. I could easily come up with microbenchmarks showing it was a big benefit. At the same time, I'm also sure it doesn't make a big difference overall. It's nice to know we have it. I like that we can write code more clearly without worrying about performance.

As usual, the code is in Github. The main commits are:

Friday, April 10, 2020

Sometimes Bugs Are Easy

Finding a bug can be a long frustrating process. Those are the ones I tend to write about. But occasionally the stars (and the tools) align and it goes quickly and easily. I figured I'd write about one of those for a change.

I released (internally) a new version of gSuneido. Users soon found a bug that hadn't been in the previous version. It had been about two weeks and roughly 40 commits since the last version.

Thank goodness for version control. It's long ago, thankfully, but there was a time when we programmed without any version control. The bad old days. Of course, my memory was better back then.

The bug was easy to recreate, which made it easy to manually bisect my Git versions and find the commit that introduced the bug. A good reminder of the efficiency of binary search.

The offending commit only changed a single file, so that simplified things. There were a lot of changes within that file, but they were independent. So I could again manually bisect the changes and quickly find the one with the bug.

And to top it off, the fix was easy.

Thursday, April 09, 2020

Editor Color Schemes

Have you noticed that editor color schemes tend to treat comments one of two ways?
  • downplay the comments e.g. low contrast gray text
  • emphasize the comments e.g. high contrast colored text
I suspect the tendency to downplay comments comes from coding styles that have a lot of "mandatory" comments, big blocks of boilerplate that isn't very relevant, or comments on almost every line. In this case the comments can obscure the actual code, so you want to downplay their visibility.

Whereas if the style is to have only a few, relatively important comments, then you want them to stand out.

Documentation comments are somewhere in between. Some editor/language combinations let you color documentation comments differently. That's easier with e.g. Javadoc where you use special /** commenting but harder with e.g. Go that just uses regular comments in specific contexts.

Top level documentation comments aren't usually a big issue because they aren't mixed with the code and don't obscure it too much. Given a choice I would make these "normal", neither important enough to highlight, nor distracting enough to downplay.

I can't find the reference now, but I saw something recently about an editor/ide (Jetbrains?) that allowed you to toggle between displaying doc comments either as formatted preview style, or as editable "raw" text. That seems like a good option. Most of the time you just want to read the doc comments and you don't want to see e.g. the Javadoc HTML tags. Go avoids this issue by using more or less plain text (a bit like Markdown) for their comments. A formatted preview style would differentiate documentation from code without downplaying it.

What's your style? Highlight comments or downplay comments?

Personally, I don't write a lot of comments and I like them to stand out.