Tuesday, December 28, 2010

The Joy of Programming

It's ridiculous how good I can feel after accomplishing a decent programming task smoothly. I almost feel a bit foolish getting pleasure from something so ... geeky. But it's a bit like a good run. It feels like you've done something worthwhile, even if it only benefits you.

Last evening, with the new MacBook Air on my lap, I cleaned up one of the last parts of the jSuneido compiler that I was uncomfortable with. I had already figured out what I wanted to do, and for once I didn't run into any unexpected problems, finishing tidily in the amount of time I had.

Previously, I'd been building a list of constants (other than strings and int's that Java handles) and then stuffing this list into the instance of the compiled class. Then, to speed things up in the generated code, I loaded an array of the constants into a local variable. It worked ok, but a lot of functions didn't need any constants. Unfortunately, I had to generate the load before I knew if there would be any. And it would be tricky to determine ahead of time if there would be any. Plus, ideally, constants should be static final so the optimizer would know they were constant.

With the new changes, I create a static final field for each constant, and then generate a  class initializer that initialized them. Accesses to the constants are simple GETSTATIC instructions. The only trick was how to get the list of constants from the compiler to the class initializer. I ended up using a static field (basically a global variable). A bit ugly, but I couldn't come up with a better way. To handle multi-threading I used a ThreadLocal. (My first thought was a ConcurrentHashMap keyed by ClassLoader, but ThreadLocal was simpler.)

I don't know how much difference this will make to performance, but I really wanted to generate code that was clean and similar to compiled Java. I think I'm pretty much there. (Apart from boxed integers and the extra layer of method/call dispatch.)

Damn! Reviewing the code as I write this I realize I missed something. (I use that list of constants in the instance for a couple of other minor things.) Oh well, can't take away how good I felt when I thought it had all gone smoothly! Good thing I'm not superstitious or I'd think I'd jinxed myself talking about how well it had gone :-) Hopefully it won't be too hard to handle.

Monday, December 27, 2010

jSuneido Catching Up with cSuneido

It's been a while since I paid attention to the relative speed of jSuneido versus cSuneido. I just realized that with my recent optimizations, jSuneido now runs the standard library test suite in pretty much the same time as cSuneido.

In some ways that's not surprising, the JVM is a sophisticated platform - the garbage collector is very fast and multi-threaded, the JIT compiles to optimized machine code, etc.

On the other hand, Java forces a lot more overhead than C++. Integers have to be boxed, classes can only be nested by reference not embedding, you can't use pointers, you can't allocate objects on the stack, and so on. It's impressive that the JVM can overcome these "drawbacks".

It's nice to reach this point. And a nice confirmation that I wasn't totally out to lunch deciding to use Java and the JVM to re-implement Suneido.

Of course, the real advantage of jSuneido over cSuneido is that the server is multi-threaded and will scale to take advantage of multiple cores.

Sunday, December 26, 2010

informIT eBooks

As I've written before (here and here), I prefer to buy ebooks without DRM, not because I want to pirate them, but because dealing with DRM is a pain in the you-know-where, especially when you have multiple devices you want to access them from.

This is (as far as I know) almost impossible for mainstream books. But for computer books it's not too bad. Both Pragmatic Programmers and O'Reilly sell DRM free ebooks. And I've recently started buying ebooks from informIT (mostly Addison-Wesley ones).

The other issue with ebooks is format. The Kindle currently only handles mobi and pdf. And pdf's are generally painful to read on ebook readers because they have a fixed, usually large (e.g. 8.5 x 11) page size, with big margins. Because of this I bought one of the bigger 9.7" Kindle DX's, and it helps, but it's still not great.

The Kindle helpfully trims white margins on pdf's, but the problem is that the ebooks informIT is providing are images of the paper books so they have stuff like page numbers and titles in the top and bottom margins which prevent trimming from working properly. And worse, the stuff in the margins is not symmetrical so odd pages end up trimmed differently than even pages which means the zoom level keeps changing. Not the end of the world, but annoying.

Both Pragmatic and O'Reilly provide ebooks in epub, mobi, and pdf. But informIT only provides epub and pdf. At first I made do with the pdf's but eventually I got fed up and started looking for alternatives.

I thought about looking for some tool to process the pdf's and crop the margins. But that's not the only problem. Kindle doesn't let you highlight sections of text in pdf's, only in mobi. And the pdf's I've had from informIT haven't had any hyperlinks (e.g. from the table of contents) although pdf's are able to do that.

I took a look at the epub's from informIT and they seemed better, at least they were hyperlinked. So I looked for tools to convert epub to mobi. Stanza was one that was suggested and I already had it installed so I gave it a try. I like Stanza but it didn't do a very good job of converting these epubs to mobi.

The other tool that was suggested was Calibre. A plus for Calibre is that it's an open source project. The user interface is a little confusing but it did a great job of converting epub to mobi, including the cover image and hyperlinks. And it even recognized when I plugged in my Kindle and let me copy the books over. I downloaded the epub versions of all my informIT books and converted them - a big improvement.

I wish I'd got fed up sooner! I could have avoided a lot of painful pdf reading.

Although I'm happy to be able to get Addison-Wesley ebooks from informIT, I do have a minor complaint - their sleazy marketing. I got an email offering me a limited time 50% off deal. Sounded good to me so I found a few books I'd been planning to get. They came to something like $80. When I entered the 50% off code the price only dropped to around $70. That's a very strange 50%. Looking closer I found that you normally get 30% off, and over $55 it goes up to 35%. So the 50% deal was only 15% better than usual. I realize this is a common marketing scam, but it still annoys me. O'Reilly plays some similar games but not quite as bad.

Friday, December 24, 2010

jSuneido Compiler Optimizations

For my own reference, and to justify my month of work, I thought I'd go over some of the optimizations I've done recently. The changes involve:
  • how functions/methods/blocks are compiled
  • how calls are compiled
  • the run-time support code e.g. in Ops, SuValue, SuClass, SuInstance, etc.
My original scheme was that arguments were passed in an array i.e. in the style of Java's variable arguments. For simplicity additional local variables were also stored in this array. This had a nice side benefit that the array could be used to capture the local variables for blocks (aka closures).

The downside is that every call had to allocate an argument array, and often re-allocate it to accommodate default arguments or local variables. And access to variables requires larger and presumably slower code. (ALOAD, ICONST, AALOAD/STORE instead of just ALOAD/STORE)

To keep the calling code smaller I was generating calls via helper functions (with variations for 1 to 9 arguments) that built the argument array. One drawback of this approach is that it added another level to the call stack. This can impede optimization since the Hotspot JVM only in-lines a limited depth.

The first step was to compile with Java arguments and locals for functions (and methods) that did not use blocks.

Then I realized that if a block didn't share any variables with its containing scope, it could be compiled as a standalone function. Which is nice because blocks require allocating a new instance of the block for each instantiation, whereas standalone functions don't since they are "static". And it meant that the outer function could then be compiled without the argument array. Determing if a function shares variables with blocks is a little tricky e.g. you have to ignore block parameters, except when you have nested blocks and an inner block references an outer blocks parameter. See AstSharesVars

Where I still needed the argument array I changed the calling code to build it in-line, without a helper function. This is what the Java compiler does with varargs calls. Generating code that is similar to what javac produces is a good idea because the JVM JIT compiler is tuned for that kind of code.

One of the complications of a dynamic language is that when you're compiling a call you don't know anything about what you'll end up calling.

On top of this, Suneido allows "fancier" argument passing and receiving than Java. This means there can be mismatches where a call is "simple" and the function is "fancy", or where the call is "fancy" and the function is simple. So there needs to be adapter functions to handle this. But you still want a "simple" call to a "simple" function to be direct.

Each Suneido function/method is compiled into a Java class that has Java methods for simple calls and fancy calls. If the Suneido function/method is simple, then it implements the simple method and the fancy method is an adapter (that pulls arguments out of a varargs array). (for example, see SuFunction2, the base class for two argument functions) If the Suneido function/method is fancy, then it implements the fancy method and the simple method is an adapter (that builds an argument array).

In cSuneido, all values derive from SuValue and you can simply call virtual methods. But jSuneido uses native types e.g. for strings and numbers. (To avoid the overhead of wrappers.) So you need something between calls and implementations that uses "companion" objects for methods on Java native types. For example, Ops.eval2 (simple method calls with two arguments) looks like:

Object eval2(Object x, Object method, Object arg1, Object arg2) {
    return target(x).lookup(method).eval2(x, arg1, arg2);

For SuValue's (which have a lookup method) target simply returns x. For Java native types it returns a companion object (which has a lookup method). lookup then returns the method object.

One issue I ran into is that now I'm using actual Java locals, the code has to pass the Java verifier's checking for possibly uninitialized variables. I ran into a bunch of our Suneido code that wouldn't compile because of this. In some cases the code was correct - the variable would always be initialized, but the logic was too complex for the verifier to confirm it. In other cases the code was just plain wrong, but only in obscure cases that probably never got used. I could have generated code to initialize every variable to satisfy the verifier, but it was easier just to "fix" our Suneido code.

The "fast path" is now quite direct, and doesn't involve too deep a call stack so the JIT compiler should be able to in-line it.

The remaining slow down over Java comes from method lookup and from using "boxed" integers (i.e. Integer instead of int). The usual optimization for method lookup is call site caching. I could roll my own, but it probably makes more sense to wait for JSR 292 in Java 7. Currently, there's not much you can do about having to use boxed integers in a dynamic language. Within functions it would be possible to recognize that a variable was only ever an int and generate code based on that. I'm not sure it's worth it - Suneido code doesn't usually do a lot of integer calculations. It's much more likely to be limited by database speed.

This Developer's Life

This Developer's Life - Stories About Developers and Their Lives

I just came across this podcast and started listening to it. So far I'm really enjoying it. It's nice to hear more of the human, personal side to "the software life". And I enjoy the music interspersed. Check it out.

I also listen to Scott's Hanselminutes but it's a totally different podcast, much more technical and Microsoft focused. (I use it to keep up on Microsoft technology, which I feel I should be aware of even if I tend to avoid it.)

Wednesday, December 22, 2010

Optimizing jSuneido Argument Passing - part 2

I think I've finished optimizing jSuneido's argument passing and function calling. The "few days" I optimistically estimated in my last post turned into almost a month. I can't believe it's taken me that long! But I can't argue with the calendar.

I gained about 10% in speed. (running the stdlib tests) That might not seem like much, but that means function call overhead was previously more than 10%, presumably quite a bit more since I didn't improve it that much. Considering that's a percentage of total time, including database access, that's quite an improvement.

When I was "finished", I wanted to check that it was actually doing what I thought it was, mostly in terms of which call paths were being used the most.

I could insert some counters, but I figured a profiler would be the best approach. The obvious choice with Eclipse is their Test and Performance Tools Platform. It was easy to install, but when I tried to use it I discovered it's not supported on OS X. It's funny - the better open source tools get, the higher our expectations are. I'm quite annoyed when what I want isn't available!

NetBeans has a profiler built in so I thought I'd try that. I installed the latest copy and imported my Eclipse project. I had some annoying errors because NetBeans assumes that your application code won't reference your test code. But include the tests as part of the application so they can be run outside the development environment. I assume there's some way to re-configure this, but I took the shortcut of simply commenting out the offending code.

I finally got the profiler working, but it was incredibly slow, and crashed with "out of memory". I messed around a bit but didn't want to waste too much time on it.

I went back to manually inserting counters, the profiling equivalent of debugging with print statements. I got mostly the results I expected, except that one of the slow call paths was being executed way more often than I thought it should be.

So I was back to needing a profiler to track down the problem. I'd previously had a trial version of YourKit, so this time I downloaded a trial version of JProfiler. It ran much faster than the NetBeans profiler and gave good results. But unfortunately, it didn't help me find my problem. (Probably just due to my lack of familiarity.)

I resorted to using a breakpoint in the debugger and hitting continue over and over again checking the the call stack each time. I soon spotted the problem. I hadn't bothered optimizing one case in the compiler because I assumed it was rare. And indeed, when I added counters, it was very rare, only occurring a handful of times. The problem was, those handful of occurrences were executed a huge number of times. I needed to be optimizing the cases that occurred a lot dynamically, not statically.

Although I only optimize common, simple cases, they account for about 98% of the calls, which explains why my optimizations made a significant difference.

One of my other questions was how many arguments I needed to optimize for. I started out with handling up to 9 arguments, but based on what I saw, the number of calls dropped off rapidly over 3 or 4 arguments, so I went with optimizing up to 4.

I can't decide whether it's worth buying YourKit or JProfiler. It doesn't seem like I want a profiler often enough to justify buying one. Of course, if I had one, and learnt how it worked, maybe I'd use it more often.

Monday, December 20, 2010

Append Only Databases

The more I think about the redirection idea from RethinkDB the more I like it.

The usual way to implement a persistent immutable tree data structure is "copy on write". i.e. when you need to update a node, you copy it and update the copy. The trick is that other nodes that point to this one then also need to be updated (to point to the new node) so they have to be copied as well, and this continues up to the root of the tree. The new root becomes the way to access the new version of the tree. Old roots provide access to old versions of the tree. In memory, this works well since copying nodes is fast and old nodes that are no longer referenced will be garbage collected. But when you're writing to disk, instead of just updating one node, now you have to write every node up the tree to the root. Even if the tree is shallow, as btrees usually are, it's still a lot of extra writing. And on top of the speed issues, it also means your database grows much faster.

The redirection idea replaces creating new copies of all the parent nodes with just adding a "redirection" specifying that accesses to the old version of the leaf node should be redirected to the new leaf node. A new version of the tree now consists of a set of redirections, rather than a new root. And you only need a single redirection for the updated leaf node, regardless of the depth of the tree. There is added overhead checking for redirection as you access the tree, but this is minimal, assuming the redirections are in memory (they're small).

One catch is that redirections will accumulate over time. Although, if you update the same node multiple times (which is fairly common) you will just be "replacing" the same redirection. (Both will exist on disk, but in memory you only need the most recent.)

At first I didn't see the big advantage of redirection. I could get similar performance improvements by lazily writing index nodes.

But the weakness of delayed writes is that if you crash you can't count on the indexes being intact. Any delayed writes that hadn't happened yet would be lost.

The current Suneido database has a similar weakness. Although it doesn't delay writes, it updates index nodes, and if the computer or OS crashes, you don't know if those writes succeeded.

The current solution is that crash recovery simply rebuilds indexes. This is simple and works well for small databases. But for big databases, it can take a significant amount of time, during which the customer's system is down. Crashes are supposed to be rare, but it's amazing how often it happens. (bad hardware, bad power, human factors)

Of course, you don't need the redirection trick to make an append only index. But without it you do a lot more writing to disk and performance suffers.

Even with an append only database, you still don't know for sure that all your data got physically written to disk, especially with memory mapped access, and writes being re-ordered. But the advantage is that you only have to worry about the end of the file, and you can easily use checksums to verify what was written.

On top of the crash recovery benefits, there are a number of other interesting benefits from an append only database. Backups become trivial, even while the database is running - you just copy the file, ignoring any partial writes at the end. Replication is similarly easy - you just copy the new parts as they're added to the database.

Concurrency also benefits, as it usually does with immutable data structures. Read transactions do not require any locking, and so should scale well. Commits need to be appended to the end of the database one at a time, obviously, but writing to a memory mapped file is fast.

Another nice side benefit is that the redirections can also be viewed as a list of the changed nodes. That makes comparing and merging changes a lot easier than tree compares and merges. (When a transaction commits, it needs to merge its changes, which it has been making on top of the state of the database when it started, with the current state of the database, which may have been modified by commits that happened since it started.)

Ironically, I was already using a similar sort of redirection internally to handle updated nodes that hadn't been committed (written) yet. But I never made the connection to tree updating.

I love having a tricky design problem to chew on. Other people might like to do puzzles or play computer games, I like to work on software design problems. I like to get absorbed enough that when I half wake up in the night and stagger to the bathroom, my brain picks up where it left off and I start puzzling over some thorny issue, even though I'm half asleep.

Sunday, December 05, 2010

The Deep Threat

"It was an illuminating moment: the deep threat isn’t losing my job, it’s working on something for which I lack passion."

- John Nack