Tuesday, June 10, 2008

jSuneido - Bye Bye Generics

Sometimes constraints are a good thing. They force you to look for alternatives. To think harder.

I started out by doing a fairly simple translation of C++ templates to Java generics in the Suneido btree code.

I immediately ran into problems because Java generics are very different from C++ templates.

I then proceeded to thrash and flail, trying one approach, abandoning it, throwing out the code, starting over with another approach. (This doesn't lead to much progress in terms of lines of code!)

The C++ btree design was fairly complicated. There were four Slot classes and four corresponding Slots classes. The Btree template was instantiated with four different combinations of these, for various purposes. Within Btree the slots were wrapped by LeafNode and TreeNode classes.

As I worked on it, I realized that three of the Slots classes were virtually identical. And on top of that, they were virtually identical to the Record code.

The fourth Slots/Slot classes were used in a single place in a specialized way that is much more suited for a hash table than a btree. At the time, I was probably so wrapped up in btree's that I didn't even think twice about it.

So in Java I've ended up with a non-generic Btree class, a single Slots class (that mostly delegates to the record class), and a single Slot class. Three classes instead of 10 or so in the C++ code. I have lost a little bit of type safety, but these classes are used in a pretty restricted way so this shouldn't be a problem. And I'll have tests anyway :-)

To be fair, some of the C++ use of templates was motivated by performance. Using templates avoids the overhead of virtual method calls. Since I never tried it without the templates it's hard to say whether that was justified.

Working with Java is forcing me to change my mindset a little. Coming from a time when memory, cpu, and disk resources were severely limited, and working on systems like Suneido where performance was important, I tend to always have "efficiency" in mind as I design.

So I'd try to avoid allocation, for example by putting objects on the stack instead of the heap. But you can't do that in Java.

I also worked hard in Suneido to make the mapping from database to memory as direct as possible. By carefully designing classes I could store them directly on disk, and with memory mapping access them directly without any reads/writes or translation. Again, this is a lot tougher in Java with no pointers. I can do a certain amount with memory mapped ByteBuffer's but it's definitely not as direct. On the positive side, the Java code is ending up a lot simpler than the C++ code. Java just doesn't provide the same opportunities for "clever" tricks.

It'll be interested to see what affect this has on the resulting performance. You might think that since hardware has improved so much that it doesn't matter. But the two versions are going to be compared side by side and if the Java version is slower, it'll be a hard sell. Although, if I can get the multi-threading working reasonably well, that will give the Java version a big advantage, even if it ends up slightly slower on a single core.

I think it's also going to be beneficial to be forced to not only read the C++ code, but to understand it. I've already discovered several sections in the code that turned out to be not used at all. (Nothing major, but still "junk".) And although, I'm not "finished", it's looking pretty clear that revisiting the btree code is going to lead to simplifying it a lot.

No comments: