The Software Life: May 2008

Friday, May 30, 2008

RailsConf - Day 2

We started off with Joel Spolsky who is an entertaining speaker, but unfortunately doesn't have as much real content in his talks as he does in his blog.

Next I went to a talk on IronRuby (Ruby on the .Net CLR) They seem to be making good progress. I would have liked to hear more about the details of their implementation. It sounds like performance is a big issue - they are currently about 10 times slower than MRI (Matz's Ruby Interpreter - the original C implementation of Ruby).

Then a talk by the developer of Limelight. I wish he'd spent more time on Limelight and less on his "10 Things I Hate About Developing Web Apps". Limelight is a "runtime" like Adobe AIR that uses a Ruby DSL to make "web like" applications but in a "cleaner", simpler way than HTML + CSS + JavaScript. (Sorry, I can't seem to find a link.)

I went out at lunchtime and got back just before the sessions were due to start, only to find the ones I wanted to see were all full. This was ironic since in the morning introduction they'd made a point of saying they hadn't sold out and didn't want to sell out because they wanted everyone to get in. I guess I should have got in line earlier. Oh well, I grabbed a latte and worked on jSuneido.

The last two sessions were on Rubinius and MagLev - two more implementations of Ruby.

Rubinius' aim is RiR - Ruby In Ruby - i.e. trying to implement as much of Ruby in the Ruby language rather thin in C like MRI. I was interested to hear they are planning/hoping to use LLVM to optimize their compiler. (I'd looked at this for Suneido.) I found it a little puzzling that one of their reasons for not using the StrongTalk VM was they couldn't understand the source code. But I seriously wonder if the LLVM source code is any easier to understand. If you're going to use another project because you want to skip the effort of writing it yourself, then I don't think it's fair to expect to "understand" it. You have to expect a certain amount of "black box".

MagLev is Ruby on top of GemStone's proprietary Smalltalk VM and including their impressive object persistence facilities. MagLev is claiming pretty amazing performance from their VM - up to 100 times faster than MRI. This seemed to raise some issues with the jRuby guys. Charles Nutter asked a question which came pretty close to saying, "if it doesn't work, you can make it as fast as you want" (MagLev doesn't implement all of Ruby yet.) I wonder if the GemStone VM is really much faster than the JVM? Not that it matters in the big picture because the JVM is everywhere whereas GemStone is nowhere (relatively).

You may notice that although this is RailsConf, none of the sessions I went to were on Rails! It suited me fine since it matched my interests, but I felt a little guilty. Of course, the one session that I couldn't get into because it was full was an actual Rails one. There is of course a Rails tie in to these sessions. It's Rails that has made Ruby so popular and the reason for all these implementations of Ruby is primarily to run Rails.

All in all, a pretty good day. The weather was beautiful, of course, since I was inside all day!

Thursday, May 29, 2008

RailsConf - Meta-programming Ruby

My tutorial this morning was on meta-programming Ruby.

The second half by Patrick Farley was about the implementation of Ruby and how it enables and affects meta-programming. With my interest in implementing programming languages this was right up my alley.

The first half, by Neal Ford, I didn't like as much. He spent some of his time bad mouthing other languages - easy prey but not very productive. I wasn't too impressed with some of his examples showing why Ruby is better - they didn't seem to be comparing apples to apples.

For example, he compared a Java static (class) for determining if a string is empty to the Ruby empty instance method. Because it's a static method the Java class has to check for null. The Ruby method doesn't need to, but not because Ruby is better, but because the Ruby method is an instance method. A Ruby class method would not only have to check for nil, but also for completely different types. The Java method loops through the characters and checks if they are whitespace. The Ruby one just calls strip to remove the whitespace. I'm not sure if the Ruby strip creates a new string or modifies the instance (Ruby strings are mutable) but either way that's quite different from the read-only Java version. A Ruby version that was equivalent to the Java version would be much the same code, as would a Java version that was equivalent to the Ruby version.

Another example was using Ruby meta-programming to dynamically create test methods for a list of cases. But rather than impress me, it just made me ask why? Why not just loop through the cases in a single test method - no fancy meta-programming required.

The funny part is that Ruby does have definite advantages for some things, but the examples didn't really show that. Maybe it's just hard to come up with good examples.

Of course, at RailsConf they're mostly preaching to the converted, so everyone just laps it up. Not being a particular Ruby/Rails fanatic and being involved with other languages, I'm a little more critical.

I'm also not totally convinced about the whole meta-programming thing. Yes, it allows some pretty amazing things. But it also allows some things that are very twisted, hard to comprehend and debug.

Still, it was a good session containing some pretty interesting material.

Benchmarks

One of the optimizations I saw mentioned for JVM languages is pre-creating small integer objects, e.g. from -128 to +128.

This seemed like an easy thing to try. So I made a simple benchmark and timed it both ways.

The problem was, there was no speed difference. There was as much variation between runs as there was between methods.

The problem with micro-benchmarks is that they generally don't do any "real" work, so if you're not careful the compiler/vm will optimize away the code you're trying to test. I figured that must be what was happening here.

Sure enough, I re-wrote the benchmark so it wouldn't be so "obvious" my code was useless, and then I got more reasonable results. (There was less variation, more consistent times as well for some reason.)

Without the optimization it took an average of 9.7 seconds. With the optimization it was 6.9 seconds. This is roughly a 30% speed improvement for a few lines of code. Not bad. Of course that's if you're doing 100% small integer operations which is never the case.

PS. I know - premature optimization. But at least I measured it! (the results anyway, if not the problem) But at this stage I'm still trying to get a feel for the JVM and this kind of experimentation helps. After all, the practicality of this whole project depends on achieving decent performance.

PPS. You won't see these (or any other) changes in Subversion until I figure out why my laptop is refusing to commit changes :-(

Tuesday, May 27, 2008

jSuneido - implementing classes and methods

Reading Headius: The Power of the JVM got me thinking about how to implement Suneido classes and methods on the JVM.

Suneido is a little easier to implement in this respect than more dynamic languages. Suneido's classes are more like Java classes - defined in a single source unit and not modifiable at runtime. Unlike Ruby you can't add methods dynamically.

It sounds like one of the keys to performance might be call site caching of method lookup. What this means is that after you've done the method lookup you cache the result at the call site. Next time you just call the cached method. Of course, the cached method is only valid if the receiver is the same class as when you cached it. If the value class is different you have to fall back to the method lookup. But in practice it's a worthwhile optimization, and not too complex to use. It's an old idea that's been around since early Smalltalk days.

One way to implement this is would be to compile each call to something like:

if (cache == null || cache.class != receiver.class)
cache = receiver.method_lookup(...)
cache.call(...)

One of the problems is that there's no "good" way to reference a method. I think that's one of the reasons why jRuby compiles each Ruby method to a Java class - so it can use a class reference as a method reference.

One of the features being considered in JSR292 (JVM improvements for dynamic languages) is method "handles" which would avoid having to compile each method into a separate class.

Another JSR292 feature is "dynamic invoke". This would move some of this mechanism into the JVM. And "anonymous" classes will help as well.

I'm still trying to figure out what's possible/practical/reasonable with the JVM. Unfortunately, there's not a lot of documentation at this level. It would be nice to know more about how other JVM languages like jRuby and Groovy are implemented, but there doesn't seem to be much out there. I guess I could try to figure it out from their source code but I bet that wouldn't be easy.

One feature I think I should investigate is the Java Proxy system. It sounds like this is used by Scala (another JVM language). Apparently it doesn't work for native type arguments but I don't think that's a problem for Suneido.

Meanwhile I'm plugging away at porting the code. The record and memory mapped file code now seems to be working (at least at a basic level).

PS. I'm writing this (and checking code into Subversion) from the Vancouver airport on my way to RailsConf in Portland. Gotta love free wireless internet!

Monday, May 26, 2008

In Portland

I'll be in Portland, Oregon for RailsConf this week.

If anyone is interested in getting together, send me an email to mckinlay axonsoft.com

Java Roadblock Part 2

I tried a test program:

RandomAccessFile f = new RandomAccessFile("tmp", "rw");
f.seek(3L * 1024 * 1024 * 1024);
f.write(123);
FileChannel channel = f.getChannel();
long size = channel.size();
System.out.println("File is " + size + " bytes large");
for (int i = 0; i < 10; ++i) {
    long ofs = 0;
    int chunksize = 512 * 1024 * 1024;
    while (ofs < size) {
        int n = (int)Math.min(chunksize, size - ofs);
        int tries = 0;
        ByteBuffer buffer = null;
        do {
            try {
                buffer = channel.map(FileChannel.MapMode.READ_ONLY, ofs, n);
            } catch (IOException e) {
                if (++tries > 10)
                    throw e;
                System.gc();
                System.runFinalization();
            }
        } while (buffer == null);
        System.out.println("Mapped " + n + " bytes at offset " + ofs + 
            " with " + tries + " tries");
        ofs += n;
        buffer = null; // "unmap"
    }
}
channel.close();

It took 0 to 3 "tries" of forcing garbage collection and finalization. Of course, this is in the context of a simple test program. A large active heap might be better - because of frequent garbage collection - or it might be worse due to delayed finalization.

This might or might not be workable for Suneido. Currently I only map up to 1 gb at a time. That means by the time I might run out of address space there would be a lot of unreferenced mappings and hopefully at least some of them would get finalized.

This was on my Mac which I see is defaulting to Java 5 (although 6 is installed). On my work Windows machine with Java 6, I get 0 tries every time! Either it works better on Windows or it works better on Java 6. Hopefully the latter!

So I guess I'll forge ahead. I'll leave in the code to force garbage collection and finalization if the mapping fails since it doesn't hurt anything. I should probably log it somehow so I know if it's happening.

One thing I haven't quite figured out is how growing the file fits in with mapping. In my test program I had to actually "write" at the end of the file to allow mapping. This may differ between operating systems.

PS. The other "solution" is to use the 64 bit version of Java (on a 64 bit OS). Then there is no shortage of address space. I could even go back to the simpler approach of just mapping the whole file. (Although there are issues with that with growing the file.)

Sunday, May 25, 2008

Java Roadblock

I was just starting to port Suneido's memory mapped file code from C++ to Java. All was going well until I couldn't find any "unmap" function.

Searching the internet gave me:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038

Basically, it looks like I'm out of luck. Either I try to rely on one of the workarounds or else I have to come up with a completely different approach.

The problem is that Suneido uses memory mapped files to access the database. The database file can be bigger than the available address space so the file is mapped in chunks and the least recently used chunks are unmapped. But in Java I can't unmap!

This really sucks. Things were going well and I was quite happy with Java. I'm not sure what to do now. I guess I can experiment with the workarounds, but even if they work there's no guarantee they'll work elsewhere on a different JVM or OS.

Thursday, May 22, 2008

jSuneido Progress

Inevitably, my progress has slowed, firstly because I was away climbing for 5 days :-) but also because of trickier porting. I'm at about 1300 lines of code.

I started working on pack/unpack (serialization/marshaling) and struggled a bit with numbers. Instead of my home brew decimal numbers I'm using Java's BigDecimal. This saved me from porting a bunch of code. The problem was that my packed format for numbers was closely tied to my implementation. And this is heavily used code so I wanted a reasonably efficient implementation. One concern was that BigDecimal is immutable, so every operation on it allocates a new value. After sleeping on it last night, I realized that a 64 bit Java long has enough precision to hold the unscaled portion of numbers. That meant I could do most of the manipulation on a fast primitive type. It also made it easy to share the same code with integer values. It still took me a few hours to wrap my head around converting from binary with an assumed decimal on the right (BigDecimal) to base 10000 with an assumed decimal on the left (the current Suneido format).

I also had to learn how to use Java ByteBuffers. This is a relatively new feature (in the java.nio package). It answered some of the questions in the back of my mind about how to port C++ code that used pointers and structs to access binary data. ByteBuffer has methods to get/put primitive types and has "views" that allow you to treat binary data as an array of primitives.

Another reason why the packed format is critical is that it is designed so values can be compared by a simple memcmp, without unpacking. This is important for database index and searching performance. This raises an issue with strings. I chose to pack them as UTF-8. For ASCII this will be ok, but down the road, with Unicode, a simple memcmp will no longer work properly.

One of the next things to tackle is packing dates. As with numbers, I'm using Java dates, rather than the C++ date code so again there will be some conversion to do.

Another area I want to tackle fairly soon is the Record class used to store field values in database records. This uses a number of C++ "tricks" so it should be fun to convert. One of the key ideas is that it provides a "view" of a block of binary data, without having to "convert" it to an internal format. Again, this is important for database performance. Hopefully the Java ByteBuffer will work for this.

Tuesday, May 20, 2008

Advantages of a Popular Platform

I'm writing Java test cases and I wonder if the tests "cover" all the code. I wonder if there's a coverage tool for Eclipse Java. Five minutes later I've found and installed EclEmma and run my tests under it. At a glance I can see which lines of my code are covered (they're colored green) and which lines aren't (they're red). And I can see a summary of the percentage coverage for each of my source files. Very slick.

Coverage tools only go so far - they're not a guarantee that your tests are "complete". But they can certainly help.

Sure, there are coverage tools for C++, but there's no way I could have found, installed, and run one in five minutes.

Don't Forget to enable Java assert

I had been happily putting assert's in my Java tests and assuming they were working. Until there was one that really should have been failing. Hmmm... I added "assert false" and it still succeeded. What's going on?!

Then I remembered reading somewhere that assert checking is disabled by default. Doh!

You enable this via a command line argument to Java. In Eclipse you can do this in the Run item settings or you can make it the default in Window > Preferences > Java > Installed JREs > Edit > Default VM Arguments: -ea

Of course, once I enabled assert checking I found that a number of my tests were failing (had been all along). Thankfully they were easy to fix!

For now, I've enabled assert checking. But perhaps I shouldn't be using assert in my tests.

For the majority of my assert's I prefer if they are always checked. The exception would be particularly slow assert's you might want to disable in production.

In my C++ Suneido code I ended up using my own "verify" instead of assert that is never disabled. Because it is never disabled I can use it with expressions that are actual part of the functionality. This avoids temporary variables whose sole purpose is to be checked by an assert. e.g.

verify(myfunc());

instead of:

bool result = myfunc();
assert(result);

I may end up doing the same thing in Java - I'll see how it goes.

Wednesday, May 14, 2008

jSuneido code on SourceForge

The start of the jSuneido source code is up on SourceForge.

The url to check it out is:

http://suneido.svn.sourceforge.net/svnroot/suneido/jsuneido

ASM Java Bytecode Tool

ASM looks like it might be useful. Groovy uses it.

And it even has a plugin for Eclipse.

jSuneido - implementing classes & more

So far, so good. I've written about 850 lines of Java that equates to about 2000 lines of C++ in two and a bit days. I've been porting the basic Suneido data types.

I'm lucky that Java is pretty close to C++, and Suneido's byte code is pretty close to Java byte code.

I'm gradually getting an idea of how to implement Suneido on top of Java and the JVM. Here's how I envision Suneido classes being compiled:

class SuClass { // ultimate base class - static, not generated
public SuValue invoke(int method, SuValue ... args) {
    return invoke2(method, args);
}
public SuValue invoke2(int method, SuValue[] args) {
    if (method == DEFAULT)
        throw SuException("method not found");
    return invoke2(DEFAULT, args);
}
}

class xxx extends SuClass { // generated
public SuValue invoke2(int method, SuValue[] args) {
    switch (method) {
      case 8346:
        return mymethod(massage(6, args, 4746, 9846, 3836));
      ...
      default:
        return super.invoke2(method, args);
    }
}
public SuValue mymethod(SuValue[] locals) {
    if (locals[2] == null) locals[2] = FALSE;
    invoke(8347, locals[1]); // call a method in "this"
    locals[4] = globals[123].invoke(7472, locals[0], locals[2]);
    locals[3] = new SuValue(
        JavaClass.a_static_method(locals[1].string()));
}
...
}

Note: I'm showing this as Java source code, but I plan to compile directly to Java byte code.

The explanation:

int method is an index into the Suneido symbol table - since it's faster to dispatch on int instead of string (and it also makes the compiled code smaller)
a variable number of arguments are received into the args array
a switch is used to dispatch to the correct method, in this example 8346 is the symbol index of "mymethod"
Suneido's extra argument passing features are implemented by "massage" which also allocates an array for the local variables of the method, the "6" is the size of this array, the remaining (variable number) arguments are symbol indexes for the methods parameter names (required to handle named/keyword arguments)
if the method is not found in the current class, invoke2 is called on the parent class, if this ends up at the ultimate base class (SuClass) then Suneido's Default method-not-found will be called, if this also ends up back at SuClass then an exception is thrown
methods receive a "locals" array containing the arguments in the first part of the array
default argument values are compiled to code, in this example the default value of the third argument is false
classes and functions are stored in the globals array
blocks are compiled into separate methods, context is passed via the locals array (this is one of the reasons for using the locals array instead of native Java local variables)
Java classes can be called directly, with the appropriate conversions

This should be relatively fast - method dispatch is just a couple of extra function calls and a switch - no reflection or table lookup. Of course, methods deep inside class hierarchies will take longer due to the "chaining".

So far I'm using unchecked exceptions. Partly because it's simpler and partly because that's what I'm used to from C++. The traditional advice (e.g. from Sun) is to use checked exceptions, but this is starting to be questioned.

I'm using the standard Java indent/curly style. It hasn't been a problem but it's a bit of a transition after 25 years of C and C++ (and Suneido) using the Whitesmiths style.

I did figure out how to configure the Home and End keys to be beginning and end of line in Eclipse. That's one thing I miss on the Mac. The "standard" appears to be Apple + left/right arrow, but I find that more awkward, it doesn't appear to be supported everywhere, and those keys are sometimes used for other purposes like switching OS X virtual desktops.

I notice that C++0x is proposing another approach to the problem I talked about in my last post. I suggested replacing:

Type var = new Type(...);

with:

Type var = new(...);

For C++0x they are proposing:

auto var = new Type(...);

This is better than my suggestion because it also allows things like:

auto var = func(...);

where the type of var is taken from the type of the return value of func.

Tuesday, May 13, 2008

jSuneido - good news, bad news

I just had a depressing thought. Suneido consists of about 60,000 lines of C++ code. Yesterday I wrote 200 lines of Java. Even if the Java is half the size of the C++ (?) that's 150 straight days of programming. I can probably do better than 200 lines once I get more up to speed on Java. But some of the code is going to be a lot harder to port than what I did yesterday. It's obviously do-able, but it's not going to be "quick" even if it's "easy" (which it probably won't be).

I'm starting to find some resources e.g. the golden spike by John Rose @ Sun and the JVM Languages Google Group. (there's some interesting stuff if you go back to the beginning of the group) It's good to know there's information out there (although these days that's almost a given) but the complexity of some of it is a little daunting.

Another C++ feature that doesn't port straight across is operator overloading. Not a big deal other than recognizing when it's used in the C++ code.

A Java annoyance - the duplication in statements like this bugs me:

HashMap hashmap = new HashMap();

C++ is similar but you notice it less because you can allocate on the stack with just:

HashMap hashmap;

And you can use typedef to make abbreviations so you can write:

Map hashmap = new Map;

Hah! Two days programming in Java and I'm already redesigning the language! Here's my proposed shortcut (syntactic sugar):

HashMap hashmap = new; // or new(args...)

On the positive side, writing a multi-threaded socket server seems a lot simpler in Java than in C++. It didn't hurt that one of the books I found at the bookstore was TCP/IP Sockets in Java.

Monday, May 12, 2008

More on jSuneido

When I wrote the jSuneido post a few days ago it was just a vague idea. But since then it's been buzzing around in my head and it seems more and more like something that should at least be investigated.

I think there are good "business" reasons for doing it, but I have to admit it's the technical challenge that excites me as much as anything!

Of course, the overwhelming temptation is to use this opportunity to "fix" parts of Suneido that in hindsight seem less than ideal. But if it's only the server part of Suneido, then it will have to stay compatible with the existing client, so that should stop me from getting too carried away!

I went through my library for Java books but they're all fairly out of date so Friday I ordered some new ones from Amazon and Chapters, including a couple on the JVM. I was out of town for the weekend but Sunday on the way home I stopped at the local bookstore and picked up a few more.

A lot of my questions revolve around how to "map" Suneido onto Java and the JVM. Luckily, Suneido is not as "dynamic" as some dynamic languages. For example, it doesn't allow adding/removing methods on classes on the fly as Ruby does. (This was a deliberate choice - Suneido's predecessor had this ability and I felt it led to more harm than good.)

Here are some highpoints:

Suneido exception => Java exception
Suneido class => Java class
Suneido function => Java class with static method
Suneido number => Java BigDecimal
Suneido instance members => Java HashMap
Suneido method call => custom method lookup (until Java invokedynamic is released)

Suneido has more argument passing options e.g. keyword parameters so this will have to be handled on top of the Java argument passing.

Suneido blocks (closures) will be one of the harder parts. The code itself can be private methods, but the challenge is to provide access to the local variables of the defining context.

I'm not sure how to implement calling Java code from Suneido code. But judging from Groovy, I assume it's do-able, presumably using Java reflection. For now, I'm not too worried about the reverse (calling Suneido code from Java).

I'm not sure whether I'll port my own regular expression code or use Java's. It depends on how compatible they are.

Another issue is how to "map" (port) my C++ code to Java. Things like reference variables and templates make this tricky in spots.

Sunday evening I installed Eclipse on my MacBook and started to get my feet wet. I have to say that starting a new project, and in Eclipse's "native" language, I had a much more positive experience than when I tried to use it for C++ recently. Of course, I fumbled around a bit as you would expect with an unfamiliar tool but overall it went pretty smoothly and I'm enjoying the Eclipse features.

I chose to start by implementing the Suneido data types. I've written about 200 lines of Java code (as reported by the Eclipse Metrics plugin) including jUnit tests. So far it's been a straightforward translation of the C++ code to Java.

Already the portability is nice, I was able to work in Mac OS X at home and then copy the project directly to Windows at work and continue working.

It's hard to say how big a project this is. Sometimes it doesn't seem too bad, other times it seems like a huge job. Time will tell I guess.

Friday, May 09, 2008

New Trick for SciTE

I just discovered that SciTE, the programming text editor that I use, has an autocomplete feature that I never knew about.

You turn it on by adding to the properties file:

autocompleteword.automatic=1

From then on, if you type a prefix of a word that has only one match within the current file it will suggest that word and you can hit Tab to accept it. This is really handy for long names.

I've seen this feature in other editors but didn't realize it was available in SciTE. (I was aware that there was a facility for setting up API's but this is simpler and more generic.)

Note: SciTE is based on Scintilla which is the code editing component used by Suneido.

Thursday, May 08, 2008

Suneido Broken & Fixed

When I was making the changes to handle thread specific storage on ACE I encountered some "ugly" code.

The first thing was a lot of instances of:

proc->stack.top()

which got even worse when they became:

tss_proc()->stack.top()

(and similar for push, pop, etc.)

In interp.cpp I had defined inline functions for these, but I hadn't extended this to elsewhere. I moved them to interp.h and converted all the instances to:

TOP()

While doing this I noticed code like:

Value* sp = GETSP();
...
SETSP(sp);

This seemed like a perfect application for an automatic destructor. So I added this to interp.h:

struct KeepSp
   {
   KeepSp()
       { sp = GETSP(); }
   ~KeepSp()
       { SETSP(sp); }
   Value* sp;
   };
#define KEEPSP    KeepSp keep_sp;

Using this, the code became simply:

KEEPSP
...

This was even better where I'd had:

Value* sp = GETSP();
Value result = ...
SETSP(sp);
return result;

because it could now be simplified to:

KEEPSP
return ...

The next thing that caught my eye was code like:

try
   {
   ...
   x->close();
   }
catch
   {
   x->close();
   throw ;
   }

Another good application of automatic destructors. So I added this to std.h

template<class T> struct Closer
   {
   Closer(T t) : x(t)
       { }
   ~Closer()
       { x->close(); }
   T x;
   };

This let me rewrite the code to simply:

Closer<mytype> closer(x);
...

(without any try-catch)

I was feeling pretty good about this refactoring, until I found that one of my tests was broken. I reviewed all my changes to see if I'd made any inadvertant typos or other mistakes. It all looked good. I started tediously reversing my changes one at a time until I found where I'd broken the test. (Theoretically I should have compiled and run the tests after each little change, but unfortunately that takes a little too long to be tolerable.)

I found the spot, but it looked fine. It almost seemed like the KeepSp destructor wasn't getting run. At one point it seemed like it worked on Visual C++ so I started looking for bugs related to destructors with GCC 3.4.2 But I'd been mistaken, it was actually broken on Visual C++ as well.

I tried using GDB but it didn't really help. In the end I added some "prints" and that gave me the clue that there was an exception being thrown, but the problem wasn't that the destructor wasn't being run - the problem was that the destructor was being run. The old way the code had been written, the stack pointer wasn't being restored if there was an exception, now it was.

Why would that cause a problem? Looking at the interp.cpp code I saw a key comment - returns from a block (that actually return from the containing function) are implemented as exceptions and pass the return value on the stack. My new code that restored the stack pointer even when there was an exception was losing the block return value.

Hmmm... now what? Do I put the code back to the way it was? But it seems kind of ugly to depend on cleanup code not getting run when there's an exception. Instead I changed block returns to store the value in the proc rather than on the stack. Now all the tests pass.

Thank goodness for tests. Only one test (out of hundreds) failed. Without that test it could have been quite a while till I ran into this bug, and it would have been a lot harder to track down.

You could argue that if I'd just left well enough alone I wouldn't have run into the problem in the first place. After all, if it's not broken, don't fix it. I don't agree. Down that path lies chaos. You have to fight entropy. Otherwise each successive change makes the code slightly worse, until eventually it's too incomprehensible and fragile to dare touch.

The day was a bit of a roller coaster. I start off feeling pretty good about the cleanups. Then I got frustrated when I couldn't figure out why it was broken. Then when I thought I'd have to undo my changes I got depressed. I know, rationally it's pretty silly to get depressed about something like that. But I guess if I didn't care enough for it to be depressing, then I would never have gotten this far in the first place. Still, you'd think after 30 plus years of programming I'd be a little less emotional about it!

At this point I decided to take a break and go for a run. That usually improves my mood. And while I was running I figured out that I could make it work with the improvements. So between the endorphins and not having to undo my changes, suddenly I was in a good mood again. That's probably a good place to leave things for the day!

OpenOffice 3 beta

OpenOffice 3 beta is available. The big feature for me was that it finally runs native on Mac OS X (instead of via X windows) but there are a bunch of other features.

I've been using NeoOffice to get a Mac version, but it's nice that this is now standard. (Although it leaves the NeoOffice project with less of a reason for existence.)

Wednesday, May 07, 2008

jSuneido ?

As I work on multi-threading Suneido I wonder if I shouldn't be shifting direction.

Does it really make sense to implement your own VM these days? Wouldn't it make more sense to re-implement Suneido on top of an existing VM such as Java?

For example, jRuby (Ruby re-implemented on the Java VM) in some cases outperforms the original C version of Ruby.

And Groovy also demonstrates that it is possible to implement a highly dynamic language on top of the Java VM.

The JVM would handle low level issues like garbage collection and JIT and multi-threading and 64 bit. (Although you'd still have to deal with higher level multi-threading issues.)

It could provide access to Java libraries so we didn't have to continually "re-invent the wheel".

It would not solve the current issue of a non-portable Windows UI. As with my current project, it could be server only to start with.

It would allow portability to everywhere that the JVM is available.

I'd have to not only learn Java, but also learn JVM issues that a normal Java programmer doesn't have to worry about. That's a little intimidating.

Of course, the other big question is why not just use an existing language/database? The big reason now, is that we have a big (to us) application written with Suneido's language and database. The thought of re-writing or porting it all to another platform is way too scary.

Moving to something like the JVM that allowed interoperability with other JVM based languages (and other databases) might actually allow us to incrementally move to another language if we chose to.

Obviously, this wouldn't be a minor decision, but it is food for thought.

More Concurrency

While I was waiting for the iMac transfer I was working on my project to multi-thread Suneido. The next piece I decided to tackle are the global hash tables for symbols and global names.

One obvious solution is to simply use many-readers/single-writer locking.

That still leaves the question of how to implement it. Do I mix the locking into the existing code? Or do I write a "wrapper" class that handles the locking and then delegates to the existing class. Neither is ideal - mixing in the locking will complicate the code but a wrapper will add overhead (although probably minimal).

But I'm not completely happy with adding locking overhead to readers. Especially for these tables. Once the server is "warmed up" these tables will almost never be written to. It would be really nice to avoid any overhead for readers.

My next thought was lock-free hash tables. I've heard various noises about this so I thought it was worth seeing what was available. As far as I could find, the answer was: not much.

The Intel Threading Building Blocks have some concurrent C++ data structures. But Boost doesn't. And STL doesn't (yet).

The most interesting thing I came across was Cliff Click's work. If you've got an hour to spare, check out the video of him giving a presentation about it. I see from his blog that he's extended his work to other data structures. Apart from the impressive benchmarks, what I really like about his approach is that it's simple and relatively understandable. (Although I still have a hard time wrapping my head around re-ordering of memory accesses and the required "fencing".)

Unfortunately, his work is all in Java. The only mention of C++ is someone commenting on the blog that they'd like a C++ version. Part of me is really tempted to implement it. It seems like it wouldn't be that hard. Famous last words.

Side note: I ran across some references to the advantages of power-of-two size + AND hash tables versus prime size + MOD (more traditional and what I'm using). One of the reasons is that MOD is slow (relative to AND). But I wonder how much that really affects overall speed.

Another thought I had was that I could just give each thread their own tables. The downside is the extra memory and redundant work maintaining the tables. This approach is closer to using separate processes instead of threads.

That led me to consider two copies - one for reading and one for writing. The reading one would be lock free. The write copy would be protected by a mutex. To write you would lock the write copy, update it, swap the two copies using an atomic compare-and-swap, and then update the other copy to keep in sync. This seems simple to implement, and simple enough to be reasonably sure it's "correct". It uses double the memory, but only for the hash table of pointers, not for the data itself. (Both copies can point to the same data.)

Strangely, I haven't seen any mention of this approach. Perhaps because it's not general purpose - it would not scale to many writers. But this shouldn't matter for my purposes. Or maybe there is some flaw that I'm missing? (concurrency issues can be like that)

The writing bottleneck could slow down startup. If so, (and obviously I'd want to measure that) then it would be fairly easy to write out symbol tables from a "warmed up" server and use these during startup to initialize.

As far as the atomic compare-and-swap, that seems readily available: in Win32 it's InterlockedCompareExchange. ACE has ACE_Atomic_Op, although it doesn't appear to have compare-and-swap. As of version 4.1 GCC has built-in atomic operations.

Given all the attention on concurrency these days, it's surprising that there aren't more tools available.

Tuesday, May 06, 2008

New iMac

My new iMac arrived today. There's a bit of a story behind it's arrival.

Originally I was going to buy it through the local store (not an actual Apple store, but an Apple retailer and service center). Of course, they didn't have the model I wanted in stock. But they had some similar models coming in a few days and they could upgrade the memory and hard disk to what I wanted. Except the person that took down my phone number mumbled that he didn't think they could upgrade the hard drive, leaving me wondering whether they could or not.

A week passed and no word. Finally I phoned them to find out what was happening. "Soon", I was told. Given the uncertainty about getting the right configuration, I decided I might as well just order on-line.

But my credit card was declined!? It was paid off and I hadn't had any problems with it. I had let Apple use the credit card info on file - maybe it was out of date. I re-entered the credit card information. It was declined again. Finally I phoned the credit card company - they had declined it because they (or probably their software) thought it was "suspicious". I'm not sure why. I've purchased from Apple before. Sure, it was a larger amount but surely that doesn't mean it automatically gets declined? And couldn't they have contacted me? Oh well, no point expecting rational behavior from something like a bank.

One the positive side, while I was messing around with the credit card I got an email from Apple saying they had come out with upgraded models and since my order hadn't been shipped I'd be getting the new model. Lucky - I'd have been really annoyed if I got the old model just a few days before the new one came out.

I got a 24" (1920 x 1200) iMac with 3 ghz Core 2 Duo with 6 mb L2, 4 gb of memory and a 1 tb hard disk. A decent improvement over my 2.2 ghz Mac mini with 2 gb of memory and 160 gb disk

I pulled it out of the box, plugged it into the power and network, put batteries in the mouse and keyboard, and fired it up. I hesitated when the migration tool wanted Firewire, but I hadn't set the machines up side by side. I decided to use the secondary Ethernet choice. (wouldn't that be clearer if they called it "Network"?) This required installing CD and DVD Sharing software on the old machine.

Soon after it started it came up with an error about losing its connection. But when I re-tried it seemed fine. Not sure what caused it. It started off saying it was going to take 10 hours, but that estimate soon dropped and it ended up taking a couple of hours to transfer about 110 gb. That seemed a little slow. I searched on the web and found some people saying Firewire was faster. I wondered about starting over but the migration didn't even have a cancel button and I was a little nervous about "pulling the plug" while it was transferring so I let it go. The nice thing about having multiple machines is that when one is tied up you can still work on another (the MacBook).

The transfer finished and at first it seemed like everything was great. But then I started running into problems. Some of the System Preferences wouldn't open. iTunes refused to start (with an error 1000). Hmmm... I searched on the internet for migration problems. One person recommended running Software Update on the new machine before migrating (so you didn't get newer settings on older software) But unfortunately, it was a little too late for that.

After a few tries and a few system restarts (reminiscent of Windows!) I got Software Update to run and that seemed to clean up the problems.

I calibrated the monitor with my Huey Pro. This seemed to make a bigger difference than on my last monitor, making the colors quite a bit "warmer" than the default.

Then I noticed that I now had a Huey icon on the menu bar. Strange, that hadn't shown up on my old machine. Then after I restarted the machine it was gone. I got on the Huey web site to see if there were any software updates. In their support section I found that the automatic startup was broken and you have to manually install. Which means it hadn't been running on my old machine. Ouch. I'm not sure if this affected the entire calibration or just the ambient light adjustment. They hope to fix their software in 2008. I wonder how many other people have this problem?

Next problem was that it had installed my Epson R1800 printer with the Gutenprint drivers. This works ok for simple printing, but not so good with printing photographs from Lightroom. I downloaded the drivers from Epson and installed them. I thought this would create the printer but it didn't, just installed the drivers. So I added the printer and said "yes" I really wanted to install this printer a second time. But when I printed a test page, nothing came out. Here we go again.

I poked around and happened to right click on the printer. One of the options was "Reset printing system". That seemed like it might help so I chose it. I got a warning saying this would delete all my printer queues and pending jobs. That seemed alright so I clicked on OK. Oops! That deleted all the printers. I guess that's what they meant by "printer queues". I re-install the printer and run a test page again. Still no errors, but nothing comes out. I notice that it appears to be printing a Postscript file. But I don't think the Epson driver uses Postscript. I try printing from another program and it works fine. It was probably working all along. Sigh. Not a very good test page. I guess it's intended for Gutenprint drivers, but why use it for other drivers?

I also had problems getting my Parallels virtual machines to work. I thought I might have to re-install the software, but after hanging up a few times it decided to start working. Except that my virtual machines had lost their network adapter. But all I had to do was re-enable it.

Of course, I'd lost the twisted printer setup that let me print from Windows with the Epson driver. But I noticed I now had Apple Bonjour on Windows. I can't remember why I installed it, but I figured it was worth a try. My first attempt failed but after enabling Printer Sharing on OS X and sharing the printer I got it to work. Even the test page printed :-) Much easier than the previous setup.

I order the iMac with the new wireless keyboard. It's tiny! Especially next to the 24" iMac. It's taking a little getting used to but the feel is nice. I won't miss the numeric keypad, but I'll miss the larger arrow keys and Home/End/Page Up/Page Down/forward Delete etc. But it's essentially identical to my MacBook keyboard so it's not too bad.

And that pretty much took up my whole day. Still, probably the most painless computer switch I've done.

Monday, May 05, 2008

Old Lessons

One of the common suggestions for testing is to test around limits or boundaries. e.g. try 0 or 1 or -1. I was painfully reminded of this by a recent debugging session.

Some background. Originally Suneido memory mapped the entire database file into memory. That was simple and fast but limited the size of the database to less than 2 gb (the amount of address space a Win32 process gets).

When our customers' databases started to approach this size I had to change Suneido to memory map a limited number of segments of the database file and re-use address space in an LRU fashion.

Considering how major this change was it went amazingly well. I was quite nervous about weird bugs from things like dangling pointers caused by re-mapping. So I did quite a lot of testing before we rolled it out. And we kept a close eye on it at first.

This was quite a while ago and I figured everything was good. Until our client with the largest database started having problems. Their database was right around 4 gb. So I did some testing and sure enough when the database reached 4 gb MapViewOfFile would fail with "Access Denied". The "Access Denied" part threw me off since it sounded like some kind of security/permissions issue. I dug through MSDN trying to find some mention of a 4 gb limit but couldn't find anything.

As with so many of these things, the error turned out to be really stupid. Memory mapping a file with Win32 is a two step process, first CreateFileMapping, then MapViewOfFile. Since it was MapViewOfFile failing I hadn't paid much attention to the CreateFileMapping step.

Both API calls accept 64 bit file offsets as two 32 bit values. I was only passing the low 32 bit value to CreateFileMapping. Stupid! Of course, it worked up to 4 gb. But over 4 gb it wasn't passing the correct offset. Whereas I was passing both halves of the offset to MapViewOfFile. It was trying to map the correct offset, but was failing because the file mapping was for a different, incorrect offset. (A little confusing because the call with the wrong arguments was succeeding and the call with the right arguments was failing.)

Why hadn't I caught this with my testing? Because the old database size limit was under 2 gb so I tested with over 2 gb. The bigger the database the slower the testing so I never bothered going as big as 4 gb. Oops.

I should have realized that the limit for 32 bit values would be an obvious failure point and I should have tested up to and over 4 gb.

After I fixed this bug I was able to grow a database past 4 gb. For a minute I thought I was done. But a little more testing revealed that it wasn't accessing the data past 4 gb properly.

This was a little tougher to track down. I got fed up with waiting so long to create big databases and came up with a way to quickly expand the temporary databases used by the tests to over 4 gb. (I could have saved considerable time in the long run if I'd done this at the start!)

The problem turned out to be a place in the code where I mistakenly stored a 64 bit file offset in a 32 bit integer. Again, this worked up to 4 gb but not beyond.

After this fix all the tests ran successfully.

Next time maybe I'll remember to test around obvious limits and boundaries.

PS. Not that it's any excuse, but it seems odd that the compiler wouldn't give a warning about assigning a 64 bit value to a 32 bit variable. I'm using -Wall with GCC and the only warning I have turned off is -Wno-sign-compare. Maybe there are warnings I need to turn on that aren't included in -Wall. I should probably look into this.

PPS. Along the way I got an excellent demonstration of the effects of locality on multi-level storage. If I created a big database by doing: for each table, add 10000 records, then it was reasonably fast. If I turned it around to: 10000 times, for each table, add 1 record, then it was about 10 times slower. Dealing with one table at a time kept all the data and index nodes for that table together so they could all be mapped into memory. Looping through each table and adding one record to each causes the data and index nodes for all the tables to be mixed together over too large an area to be mapped all at once, causing continuous un-mapping and re-mapping.