Thursday, May 22, 2008

jSuneido Progress

Inevitably, my progress has slowed, firstly because I was away climbing for 5 days :-) but also because of trickier porting. I'm at about 1300 lines of code.

I started working on pack/unpack (serialization/marshaling) and struggled a bit with numbers. Instead of my home brew decimal numbers I'm using Java's BigDecimal. This saved me from porting a bunch of code. The problem was that my packed format for numbers was closely tied to my implementation. And this is heavily used code so I wanted a reasonably efficient implementation. One concern was that BigDecimal is immutable, so every operation on it allocates a new value. After sleeping on it last night, I realized that a 64 bit Java long has enough precision to hold the unscaled portion of numbers. That meant I could do most of the manipulation on a fast primitive type. It also made it easy to share the same code with integer values. It still took me a few hours to wrap my head around converting from binary with an assumed decimal on the right (BigDecimal) to base 10000 with an assumed decimal on the left (the current Suneido format).

I also had to learn how to use Java ByteBuffers. This is a relatively new feature (in the java.nio package). It answered some of the questions in the back of my mind about how to port C++ code that used pointers and structs to access binary data. ByteBuffer has methods to get/put primitive types and has "views" that allow you to treat binary data as an array of primitives.

Another reason why the packed format is critical is that it is designed so values can be compared by a simple memcmp, without unpacking. This is important for database index and searching performance. This raises an issue with strings. I chose to pack them as UTF-8. For ASCII this will be ok, but down the road, with Unicode, a simple memcmp will no longer work properly.

One of the next things to tackle is packing dates. As with numbers, I'm using Java dates, rather than the C++ date code so again there will be some conversion to do.

Another area I want to tackle fairly soon is the Record class used to store field values in database records. This uses a number of C++ "tricks" so it should be fun to convert. One of the key ideas is that it provides a "view" of a block of binary data, without having to "convert" it to an internal format. Again, this is important for database performance. Hopefully the Java ByteBuffer will work for this.

No comments: