Tuesday, September 06, 2011

Concurrency Continued

I figured I'd better write a concurrency stress test for my DbHashTrie semi-immutable data structure, to increase my confidence that there were no problems. (Of course, a test can never prove the absence of problems.)

I write my test and it passes. But as a good cynical/skeptical tester, I tweak it so it should fail. It still passes?! I realize that I'm running in multiple threads and if an assert fails it'll just throw an exception and that thread will silently die, swallowing the exception. Oops. I add try-catch around the thread body.

Now it fails, as expected. I remove the tweak, expecting it to now succeed. It still fails! I remove the multi-threading. It still fails. Huh!? This code is in live use, if there's a bug, that's definitely not good.

After thrashing for a (thankfully short) time, I realize it was a bug in my test, not the actual code. To make it worse, I've been caught by the same thing multiple times. Some day I'll learn! When building a map-like data structure with random keys, you have to remember that you could get duplicate keys, which in most maps will overwrite the previous value. If your test checks for the original value, it'll fail since it'll get the new value.

Finally the test seems to be working properly. Except that it's not maxing out my cores. Since it's entirely in memory, with no IO, I can't see why it wouldn't. I play around with various parameters like the number of threads, size of data, and number of iterations but it still doesn't max out my cpu usage. (I use iStat Menus to show my cpu usage all the time on the menu bar so I can easily monitor what's going on.)

The only thing I can think of is the locking. I comment out the two places with synchronized and re-run the tests. Voila, all eight cores at 100%. Not normally what you want to see, but in this case it was.

Hmmm... if it wasn't maxing out the cpu, it must have been running slower. I add some timing. Yikes, the synchronized version is over 10 times slower! (45 seconds with synchronized, 3.8 seconds without.)

Of course, this is close to a worst case scenario since the test doesn't do much more than call the one synchronized method. And although the locking is per node, every access has to start at the root node. Java locks are very fast these days, but only when uncontested. And in this case the nodes at the root of the tree will be heavily contested.

If it was a single variable I could just make it volatile, but it's an array, and putting volatile on an array  only makes the reference volatile, not the contents. That's where AtomicReferenceArray comes in. It had crossed my mind a few times, but the interface is quite different from an array so it would be a bit of a hassle to switch. I'd also have to write my own versions of Arrays.copyOf and System.arraycopy

I changed the code to use AtomicReferenceArray. It was a bit slower than the raw array, but not a lot (5 seconds, versus 3.8).

I also did some checking, and the JVM spec does guarantee that updates are atomic (except for long's and double's). So I think I'm safe not doing anything special for concurrent access. That means threads could read stale values. Which means there could potentially be a data race where two threads could end up loading the same node at the same time. But as far as I can figure, this should be benign since they will both get the same node (it's never modified in the memory-mapped file) and therefore it doesn't matter which one wins. I think it's equivalent to how String calculates its hashCode lazily.

The question is, do I trust this reasoning? Or do I go with the safer, but slower and uglier AtomicReferenceArray?

A little searching on the web turns up: Date-Race-Ful Lazy Initialization for Performance. Ironically, this both confirms my reasoning, and also identifies a bug in my code. I was accessing the shared variable more than once in my code, which can lead to problems due to memory access re-ordering. Easily fixed, but yet another example that it's hard to reason correctly about this stuff. (Just to hammer the point home, another article I found, Implementing hashCode, has the same bug with reading the variable multiple times without using volatile.)

The blog post also references a section in Effective Java (2nd ed) by Joshua Bloch so I dig my copy out and review it. It confirms that the single-check idiom is valid, but also recommends using double checked locking with a volatile variable instead. Even for the single check idiom, it recommends using a volatile variable, although it's not strictly necessary.

After all that, I think I'm going to go with the simplest code with the raw array, non-volatile, and unsynchronized. That's not because I trust my reasoning, but because I trust the explanation and examples of people who are much smarter than me.


No comments: