Tuesday, May 15, 2012

Immudb Concurrent Performance Puzzle

One of the tests I hadn't run on immudb (my new append-only database engine for Suneido) was a concurrency test I'd used to debug the previous database engine.

Here are some rough results. As usual, this is not a scientific benchmark. The operations it's doing aren't necessarily representative, and I'm not running it long enough or enough times to get accurate figures. But it still gives me some information.

I'm pretty happy with the results relative to the previous version. Up to four client threads it's about four times as fast - can't complain about that. And another good result that's not shown by this chart is that immudb has a lot less transaction conflicts.

Also on the positive side, I only found a couple of bugs, and none of them took more than a few minutes to fix. This is a sharp contrast to when I was debugging the previous version. I give credit to the immutability and the resulting reduction in locking.

But I'm puzzled by the drop in performance over 4 threads. And I didn't even include 8 threads because the results were all over the place - anywhere from 3000 to 6000. There's some variation with less threads, but nowhere near this much.

If the performance just levelled off, that would be one thing, but I'm not happy with the performance dropping under heavy load - that's not what you want to see.

The previous version may not perform as well, but it doesn't show the puzzling drop off with more threads. Although, even with the drop, the immudb performance is still much better than the previous version.

At first I was only running 1, 2, 4, and 8 threads. My computer has 4 cores with hyper-threading for more or less 8 hardware threads so I thought maybe it was because I was using up all the hardware threads and not leaving any for other things like garbage collection and the OS. But that wouldn't explain the drop off at 5 threads.

I looked at hprof output, but it didn't give me any clues. It did highlight some areas that I could probably improve, but not related to concurrency. (they were related to one of my pet peeves - ByteBuffer)

More threads could mean more memory usage, but JConsole and VisualVM don't show a lot of time in garbage collection.

The obvious issue is some kind of contention. Immudb doesn't do much locking (because most things are immutable) but I could see contention over the commit lock. But JConsole and VisualVM  don't show much waiting for locks. Thread dumps show all the threads as RUNNABLE And I don't see underutilization of the cpus, as I would expect with heavy contention (since they'd be waiting for locks, not executing).

Another possibility is that more concurrent transactions will mean more work to do read validation. And since each commit has to check against all other concurrent transactions, this is ON2 which could be bad. But again, I don't see any signs of it in my monitoring.

And I can't think of any explanation for the excessive variation. Strangely, 16 threads shows much less variation than 8.

I'm more puzzled than worried at this point. In actual usage I don't think it will be an issue because our systems, even the largest ones, won't be applying a continuous load this heavy.

Anyone have any ideas or suggestions of things to look at or try?

No comments: