Friday, June 13, 2008

jSuneido - concurrency strategies

I've been thinking about concurrency as I make progress on porting the btree indexing code (inserting and iterating are working).

The simplest, obvious way to handle concurrency would be to lock entire indexes. You could argue that there are lots of tables and each table has multiple indexes, so usage should be spread out enough to minimize contention. Unfortunately, the 80-20 rule (or worse) applies and activity tends to focus on particular tables and indexes.

Since there is a lot more "read" activity than "write" activity, an obvious improvement would be to use "many readers / single writer" locking.

There are strategies for locking individual index nodes in btrees, but you have to be very careful on the order of locking or you end up with deadlocks.

Most btree updates only update a single node - it would be nice if those cases just locked that one node. This would definitely reduce contention. But some updates require changes to more of the tree and then it makes more sense to lock the entire index. I think I can reconcile these by having single node updaters first acquiring a read lock on the entire index and then write locking the single node. The read lock will prevent anything else from acquiring a write lock on the entire index.

A complication is that you don't know if it will be a single node update until you get there. But if you discover that it will require larger scale changes, then you can upgrade the whole index read lock to a write lock.

This approach should be deadlock free since you're always locking in a fixed order - index as a whole, and then a single node.

There is, of course, another problem - there are no persistent in-memory objects for index nodes. They are mapped in and out of memory as needed. So it doesn't do any good to lock your node object because it's just a temporary object and other threads will have their own. (Locking the entire index is ok because there is a persistent shared object for the index as a whole.)

I could cache the node objects (with weak references so they could still be garbage collected) but then the cache would need synchronization.

Or I could use OS file locking but I've had bad experiences with that in the past.

Another option would be to associate a lock object with each memory mapped "section" and lock at that level. But chunks are 4 mb so this would be pretty coarse grained locking. (Index nodes are only 4k so you could potentially be locking a thousand of them.)

I'm busy reading concurrency books, maybe one of them will give me a better idea. To start I can always just use many-readers / single writer locking on the index as a whole.

No comments: