I recently listened to a Herding Code podcast on Approval Tests.
The basic idea is that instead of writing:
assert(x == <some-value>)
instead you write:
approve(x)
and when you run the test it shows you the result and asks you to "approve" it. Once you've approved a value, the test will pass (without interaction) as long as the result doesn't change.
If the value is something simple like a number, there's not much point. But if the value is a more complex object like an invoice, then it's much simpler to approve the whole thing than to write a lot of asserts.
I can't see it replacing conventional style asserts, but it seems like it could be useful.
Saturday, December 24, 2011
Friday, December 16, 2011
Textastic
I was looking for some way to have my Suneido source code on my iPad. More for viewing than to actually write code. I tried a few things, and settled on Textastic.
I was able to retrieve my cSuneido code from the SourceForge Subversion repository, but I didn't manage to get the jSuneido code from Mercurial. (It may just have been really slow. Or maybe it was trying to download the history?) So I just copied my source directory to Dropbox and pulled it from there.
I used it a few times while I was traveling recently and it worked well. For example, I'd get an email from my staff asking what a particular error message meant and I was able to find it in the source code and see where it came from.
Textastic is somewhat compatible with TextMate. I have a copy of TextMate on my Mac but I don't actually use it much since I tend to work in Eclipse.
From my limited use, I'd recommend it. It's not free, but for $10 you can't go too far wrong.
I was able to retrieve my cSuneido code from the SourceForge Subversion repository, but I didn't manage to get the jSuneido code from Mercurial. (It may just have been really slow. Or maybe it was trying to download the history?) So I just copied my source directory to Dropbox and pulled it from there.
I used it a few times while I was traveling recently and it worked well. For example, I'd get an email from my staff asking what a particular error message meant and I was able to find it in the source code and see where it came from.
Textastic is somewhat compatible with TextMate. I have a copy of TextMate on my Mac but I don't actually use it much since I tend to work in Eclipse.
From my limited use, I'd recommend it. It's not free, but for $10 you can't go too far wrong.
Wednesday, October 19, 2011
Long Polling with Suneido
It started with our summer student deciding to build an "instant messenger" in Suneido. I'm not sure what prompted this, but it seemed like an interesting challenge so I let him run with it.
The trick is the "instant" part. His initial approach was to store the messages in a database table, and poll the table for changes. To make it "instant" he wanted to poll very frequently, at least once a second, and preferably more. We have about 50 users, if they are all polling every second, that's an extra 50 queries per second. I wasn't too keen on this.
I suggested we set up a separate HttpServer for the messenger, and keep the "outstanding" messages in memory, so polling didn't have to hit the database. (New messages were still written to a history table in the database, but that's not as frequent as the polling.) This took the load off the main database server, and made the requests faster.
I explained that a better way to make it "instant" was to use Comet style long polling where the server simply doesn't respond to the request until something is available. This means a lot more connections to the server since every client always has an outstanding request, which can be an issue depending on the server. But we were using jSuneido for the messenger HTTP server, and 50 connections wasn't a problem.
But the client side is also an issue. If we're going to be waiting a potentially long time for a response, the client can't be waiting "blocked", we need to make the request asynchronously. Suneido doesn't have a lot of support for aynchronous programming, but it does have a Thread function that allows you to execute code in a separate "fiber" - not a real thread, more like a coroutine. (On jSuneido "Thread" uses a real thread.)
We've never used Thread much because it has always seemed a little flaky. But it is basically sound because it's what the Suneido database server uses to handle multiple users - and that is solid. And it's also how SocketServer is implemented, and that also works well (e.g. in HttpServer)
One issue is that you want to avoid updating the GUI from a thread. That is easy to do by using Delayed with a 0 delay to process the response. Delayed works via the Windows message queue, which is processed by the main fiber. The code running inside the thread looks like:
forever
{
if "" isnt response = Http.Get(uri)
Delayed(0, ResponseProcessor(response))
}
(plus catching exceptions etc.)
However, we haven't eliminated polling, we've just moved it to the messenger HTTP server, since it must poll for new messages. (Since Suneido doesn't have any way to "wait" for something.) The code looks something like:
400.Times
{
if Suneido.MessagesAvailable
return ...
Sleep(100)
}
return ""
400 times 100ms is 40 seconds, less than the default timeout of 60 seconds. Checking once every 100 ms (1/10 of a second) is a light load on the server, most of the time it's sleeping. (Note: this wouldn't work well on a cSuneido HTTP server because Sleep would block all the other fibers.)
This has ended up working remarkably well - the messenger is "instant" with little load on the server or client.
The trick is the "instant" part. His initial approach was to store the messages in a database table, and poll the table for changes. To make it "instant" he wanted to poll very frequently, at least once a second, and preferably more. We have about 50 users, if they are all polling every second, that's an extra 50 queries per second. I wasn't too keen on this.
I suggested we set up a separate HttpServer for the messenger, and keep the "outstanding" messages in memory, so polling didn't have to hit the database. (New messages were still written to a history table in the database, but that's not as frequent as the polling.) This took the load off the main database server, and made the requests faster.
I explained that a better way to make it "instant" was to use Comet style long polling where the server simply doesn't respond to the request until something is available. This means a lot more connections to the server since every client always has an outstanding request, which can be an issue depending on the server. But we were using jSuneido for the messenger HTTP server, and 50 connections wasn't a problem.
But the client side is also an issue. If we're going to be waiting a potentially long time for a response, the client can't be waiting "blocked", we need to make the request asynchronously. Suneido doesn't have a lot of support for aynchronous programming, but it does have a Thread function that allows you to execute code in a separate "fiber" - not a real thread, more like a coroutine. (On jSuneido "Thread" uses a real thread.)
We've never used Thread much because it has always seemed a little flaky. But it is basically sound because it's what the Suneido database server uses to handle multiple users - and that is solid. And it's also how SocketServer is implemented, and that also works well (e.g. in HttpServer)
One issue is that you want to avoid updating the GUI from a thread. That is easy to do by using Delayed with a 0 delay to process the response. Delayed works via the Windows message queue, which is processed by the main fiber. The code running inside the thread looks like:
forever
{
if "" isnt response = Http.Get(uri)
Delayed(0, ResponseProcessor(response))
}
(plus catching exceptions etc.)
However, we haven't eliminated polling, we've just moved it to the messenger HTTP server, since it must poll for new messages. (Since Suneido doesn't have any way to "wait" for something.) The code looks something like:
400.Times
{
if Suneido.MessagesAvailable
return ...
Sleep(100)
}
return ""
400 times 100ms is 40 seconds, less than the default timeout of 60 seconds. Checking once every 100 ms (1/10 of a second) is a light load on the server, most of the time it's sleeping. (Note: this wouldn't work well on a cSuneido HTTP server because Sleep would block all the other fibers.)
This has ended up working remarkably well - the messenger is "instant" with little load on the server or client.
Tuesday, October 18, 2011
JUnit Max
I've been using JUnit Max for the last few days. It's an Eclipse extension that automatically runs your tests every time you save a change.
Normally I'd stick to open source options (which this is not), but I figure Kent Beck deserves some support.
It's taking me a little getting used to - makes me nervous not running the tests manually. It's designed to be unobtrusive, but part of me wants more blatant feedback that my tests passed.
One of my motivations for using it is to make sure I'm running all the tests, not just the one I'm working on. I've been caught a few times where I didn't discover I broke something elsewhere till some time afterwards.
Even with a fast machine, it still takes a while for the tests to run (although I'm the only one to blame for the speed of the tests), but at least they run in the background and I can continue working.
The web site is pretty minimal and there is no documentation as far as I can tell, but then again, it's pretty self explanatory.
One of the things that prompted me to try it was listening to a podcast with Kent Beck on Software Engineering Radio (recommended).
JUnit Max has had a bit of a rocky history, pulled in 2009, relaunched in 2010. I think it's a worthwhile product and I hope it's successful.
Normally I'd stick to open source options (which this is not), but I figure Kent Beck deserves some support.
It's taking me a little getting used to - makes me nervous not running the tests manually. It's designed to be unobtrusive, but part of me wants more blatant feedback that my tests passed.
One of my motivations for using it is to make sure I'm running all the tests, not just the one I'm working on. I've been caught a few times where I didn't discover I broke something elsewhere till some time afterwards.
Even with a fast machine, it still takes a while for the tests to run (although I'm the only one to blame for the speed of the tests), but at least they run in the background and I can continue working.
The web site is pretty minimal and there is no documentation as far as I can tell, but then again, it's pretty self explanatory.
One of the things that prompted me to try it was listening to a podcast with Kent Beck on Software Engineering Radio (recommended).
JUnit Max has had a bit of a rocky history, pulled in 2009, relaunched in 2010. I think it's a worthwhile product and I hope it's successful.
Sunday, October 16, 2011
Zite: Personalized Magazine for iPad
I've been playing with this a little. I like how it analyzes your on-line presence and suggests material. And then lets you "vote" material up or down. It found quite a few interesting computer articles I hadn't seen. Of course, the last thing I need is more stuff to read!
Friday, October 07, 2011
The Grass Always Seems Greener
After tracking down the third bug to the same area of the code, I decided I had to stop adding more special cases and come up with a better solution.
The code was in the B-tree (technically B+-tree) implementation in my new append-only storage engine for jSuneido.
I'll try to explain the problem. I combine keys with a child node pointer. That leaves the question of whether a key points to the values less than it, or greater than it. I've always gone with greater than. But the problem is that when you do a lower bound binary search, the resulting position is one higher than the one you want. For example, if the tree node contains the keys 2, 4, 6, 8, the leaf containing 5 will be pointed at by the "4". But a lower bound binary search will give the position where you would insert 5, i.e. the position of the "6" key.
You can adjust for this, but it gets ugly. After the multiple bugs, I started to think that making the keys point to values less than the key would be cleaner because you wouldn't have the off by one problem.
So yesterday morning I set out to change the code. It's not a large amount of code and the changes were straightforward, but the code is fairly complex, and the "greater than" assumption was hidden in more places than I would have liked.
Eight hours later I had all my tests passing again. And it did simplify some parts of the code. But it also turned out (not surprisingly) that this approach had its own drawbacks. For example, a standard optimization is to detect when keys are being added in order (i.e. to the "end" of the B-tree) and to put only the new key in the new node, keeping the old full node intact. The way my code works, this is much easier with the "greater than" approach.
I'm sure I could have got it all worked out, but switching approaches was not the clear winner that I'd hoped.
Actual written code, with all it's messiness, always seems inferior to an imagined alternative, which is always perfectly clean. (And therefore, the grass is always greener title.) But when you implement the alternative, it often ends up just as messy as what you had before. (This relates to never rewrite from scratch)
I came up for air, ate some supper, realized I hadn't stepped foot outside the house all day, and considered my options. A glass of wine in hand, I returned to the computer. I wanted to "finish" working on it while all the details were fresh in my mind. I knew I wouldn't get a chance to work on it again for a week or so, and then I'd have to get back up to speed.
I threw out all the changes I'd made. (Except for a few additional tests I'd written.) Annoying, but at least it was only a days work.
By carefully adjusting the code I was able to keep the old approach, but remove most of the special cases that had been causing me trouble. The code is quite a bit cleaner now. I still have the off by one adjustment, but it's in a single place now.
Meanwhile, I've been reading Tokutek's blog about how COLA (cache oblivious lookahead array) style fractal trees are so much better than B-trees. That grass sure looks greener over there :-)
The code was in the B-tree (technically B+-tree) implementation in my new append-only storage engine for jSuneido.
I'll try to explain the problem. I combine keys with a child node pointer. That leaves the question of whether a key points to the values less than it, or greater than it. I've always gone with greater than. But the problem is that when you do a lower bound binary search, the resulting position is one higher than the one you want. For example, if the tree node contains the keys 2, 4, 6, 8, the leaf containing 5 will be pointed at by the "4". But a lower bound binary search will give the position where you would insert 5, i.e. the position of the "6" key.
You can adjust for this, but it gets ugly. After the multiple bugs, I started to think that making the keys point to values less than the key would be cleaner because you wouldn't have the off by one problem.
So yesterday morning I set out to change the code. It's not a large amount of code and the changes were straightforward, but the code is fairly complex, and the "greater than" assumption was hidden in more places than I would have liked.
Eight hours later I had all my tests passing again. And it did simplify some parts of the code. But it also turned out (not surprisingly) that this approach had its own drawbacks. For example, a standard optimization is to detect when keys are being added in order (i.e. to the "end" of the B-tree) and to put only the new key in the new node, keeping the old full node intact. The way my code works, this is much easier with the "greater than" approach.
I'm sure I could have got it all worked out, but switching approaches was not the clear winner that I'd hoped.
Actual written code, with all it's messiness, always seems inferior to an imagined alternative, which is always perfectly clean. (And therefore, the grass is always greener title.) But when you implement the alternative, it often ends up just as messy as what you had before. (This relates to never rewrite from scratch)
I came up for air, ate some supper, realized I hadn't stepped foot outside the house all day, and considered my options. A glass of wine in hand, I returned to the computer. I wanted to "finish" working on it while all the details were fresh in my mind. I knew I wouldn't get a chance to work on it again for a week or so, and then I'd have to get back up to speed.
I threw out all the changes I'd made. (Except for a few additional tests I'd written.) Annoying, but at least it was only a days work.
By carefully adjusting the code I was able to keep the old approach, but remove most of the special cases that had been causing me trouble. The code is quite a bit cleaner now. I still have the off by one adjustment, but it's in a single place now.
Meanwhile, I've been reading Tokutek's blog about how COLA (cache oblivious lookahead array) style fractal trees are so much better than B-trees. That grass sure looks greener over there :-)
Thursday, October 06, 2011
Farewell to Steve
I sat down with my iPad last night to catch up on my blog reading and the first thing I saw was about Steve having died. We knew it was probable, but it was still a shock. When someone is so much larger than life, you don't want to believe they are bound by the same physical laws as the rest of us.
Apple started in 1977, at a time when I was becoming increasingly absorbed by computers. My first job (only job really, other than Axon) was at a local computer store. In the service department I repaired countless Apple II's. I remember when we first saw the Lisa, and then the Mac. It's hard to communicate what a revolution it was. Nowadays the world is filled with similar amazing gadgets, but not back then.
What a long way Apple has come, and a bumpy road at times. As an entrepreneur myself at that point, I remember my shock at Steve getting kicked out of his own company. Apple struggled and Microsoft grew to dominate. Back then we could not have imagined Apple ending up bigger than Microsoft. (Any more than we imagined Microsoft overtaking IBM.) And the idea that Apple would become the most valuable company in the world would have seemed like total fantasy. Steve returning to Apple and leading it to world domination still seems like a fairy tale.
Steve's abilities as a showman and salesman were alien to a geek like me. To me, what set him apart was his focus on design and user experience. Obviously, he didn't personally do all the design work. But I have to think he played a huge part in gathering and guiding his team, and shaping their culture and priorities. I can't help but wonder how things will change at Apple with Steve gone. In the short term, they will, no doubt, continue on much the same path. But in the long run, I wonder. (Just as I wonder how Axon will change when I'm not there. Not necessarily for the worse, but inevitably different.)
I don't agree with everything that Steve and Apple have done. Their closed, controlled, curated (and taxed) world is not my ideal, but it's hard to argue with what they have achieved. Apple's financial accomplishments are impressive. Even more impressive, is that they have one of the highest customer satisfaction ratings. What other company has grown so large, and yet still pleased so many people?
All in all, his life was an insanely great story.
We'll miss you Steve.
Apple started in 1977, at a time when I was becoming increasingly absorbed by computers. My first job (only job really, other than Axon) was at a local computer store. In the service department I repaired countless Apple II's. I remember when we first saw the Lisa, and then the Mac. It's hard to communicate what a revolution it was. Nowadays the world is filled with similar amazing gadgets, but not back then.
What a long way Apple has come, and a bumpy road at times. As an entrepreneur myself at that point, I remember my shock at Steve getting kicked out of his own company. Apple struggled and Microsoft grew to dominate. Back then we could not have imagined Apple ending up bigger than Microsoft. (Any more than we imagined Microsoft overtaking IBM.) And the idea that Apple would become the most valuable company in the world would have seemed like total fantasy. Steve returning to Apple and leading it to world domination still seems like a fairy tale.
Steve's abilities as a showman and salesman were alien to a geek like me. To me, what set him apart was his focus on design and user experience. Obviously, he didn't personally do all the design work. But I have to think he played a huge part in gathering and guiding his team, and shaping their culture and priorities. I can't help but wonder how things will change at Apple with Steve gone. In the short term, they will, no doubt, continue on much the same path. But in the long run, I wonder. (Just as I wonder how Axon will change when I'm not there. Not necessarily for the worse, but inevitably different.)
I don't agree with everything that Steve and Apple have done. Their closed, controlled, curated (and taxed) world is not my ideal, but it's hard to argue with what they have achieved. Apple's financial accomplishments are impressive. Even more impressive, is that they have one of the highest customer satisfaction ratings. What other company has grown so large, and yet still pleased so many people?
All in all, his life was an insanely great story.
We'll miss you Steve.
Tuesday, September 06, 2011
Concurrency Continued
I figured I'd better write a concurrency stress test for my DbHashTrie semi-immutable data structure, to increase my confidence that there were no problems. (Of course, a test can never prove the absence of problems.)
I write my test and it passes. But as a good cynical/skeptical tester, I tweak it so it should fail. It still passes?! I realize that I'm running in multiple threads and if an assert fails it'll just throw an exception and that thread will silently die, swallowing the exception. Oops. I add try-catch around the thread body.
Now it fails, as expected. I remove the tweak, expecting it to now succeed. It still fails! I remove the multi-threading. It still fails. Huh!? This code is in live use, if there's a bug, that's definitely not good.
After thrashing for a (thankfully short) time, I realize it was a bug in my test, not the actual code. To make it worse, I've been caught by the same thing multiple times. Some day I'll learn! When building a map-like data structure with random keys, you have to remember that you could get duplicate keys, which in most maps will overwrite the previous value. If your test checks for the original value, it'll fail since it'll get the new value.
Finally the test seems to be working properly. Except that it's not maxing out my cores. Since it's entirely in memory, with no IO, I can't see why it wouldn't. I play around with various parameters like the number of threads, size of data, and number of iterations but it still doesn't max out my cpu usage. (I use iStat Menus to show my cpu usage all the time on the menu bar so I can easily monitor what's going on.)
The only thing I can think of is the locking. I comment out the two places with synchronized and re-run the tests. Voila, all eight cores at 100%. Not normally what you want to see, but in this case it was.
Hmmm... if it wasn't maxing out the cpu, it must have been running slower. I add some timing. Yikes, the synchronized version is over 10 times slower! (45 seconds with synchronized, 3.8 seconds without.)
Of course, this is close to a worst case scenario since the test doesn't do much more than call the one synchronized method. And although the locking is per node, every access has to start at the root node. Java locks are very fast these days, but only when uncontested. And in this case the nodes at the root of the tree will be heavily contested.
If it was a single variable I could just make it volatile, but it's an array, and putting volatile on an array only makes the reference volatile, not the contents. That's where AtomicReferenceArray comes in. It had crossed my mind a few times, but the interface is quite different from an array so it would be a bit of a hassle to switch. I'd also have to write my own versions of Arrays.copyOf and System.arraycopy
I changed the code to use AtomicReferenceArray. It was a bit slower than the raw array, but not a lot (5 seconds, versus 3.8).
I also did some checking, and the JVM spec does guarantee that updates are atomic (except for long's and double's). So I think I'm safe not doing anything special for concurrent access. That means threads could read stale values. Which means there could potentially be a data race where two threads could end up loading the same node at the same time. But as far as I can figure, this should be benign since they will both get the same node (it's never modified in the memory-mapped file) and therefore it doesn't matter which one wins. I think it's equivalent to how String calculates its hashCode lazily.
The question is, do I trust this reasoning? Or do I go with the safer, but slower and uglier AtomicReferenceArray?
A little searching on the web turns up: Date-Race-Ful Lazy Initialization for Performance. Ironically, this both confirms my reasoning, and also identifies a bug in my code. I was accessing the shared variable more than once in my code, which can lead to problems due to memory access re-ordering. Easily fixed, but yet another example that it's hard to reason correctly about this stuff. (Just to hammer the point home, another article I found, Implementing hashCode, has the same bug with reading the variable multiple times without using volatile.)
The blog post also references a section in Effective Java (2nd ed) by Joshua Bloch so I dig my copy out and review it. It confirms that the single-check idiom is valid, but also recommends using double checked locking with a volatile variable instead. Even for the single check idiom, it recommends using a volatile variable, although it's not strictly necessary.
After all that, I think I'm going to go with the simplest code with the raw array, non-volatile, and unsynchronized. That's not because I trust my reasoning, but because I trust the explanation and examples of people who are much smarter than me.
I write my test and it passes. But as a good cynical/skeptical tester, I tweak it so it should fail. It still passes?! I realize that I'm running in multiple threads and if an assert fails it'll just throw an exception and that thread will silently die, swallowing the exception. Oops. I add try-catch around the thread body.
Now it fails, as expected. I remove the tweak, expecting it to now succeed. It still fails! I remove the multi-threading. It still fails. Huh!? This code is in live use, if there's a bug, that's definitely not good.
After thrashing for a (thankfully short) time, I realize it was a bug in my test, not the actual code. To make it worse, I've been caught by the same thing multiple times. Some day I'll learn! When building a map-like data structure with random keys, you have to remember that you could get duplicate keys, which in most maps will overwrite the previous value. If your test checks for the original value, it'll fail since it'll get the new value.
Finally the test seems to be working properly. Except that it's not maxing out my cores. Since it's entirely in memory, with no IO, I can't see why it wouldn't. I play around with various parameters like the number of threads, size of data, and number of iterations but it still doesn't max out my cpu usage. (I use iStat Menus to show my cpu usage all the time on the menu bar so I can easily monitor what's going on.)
The only thing I can think of is the locking. I comment out the two places with synchronized and re-run the tests. Voila, all eight cores at 100%. Not normally what you want to see, but in this case it was.
Hmmm... if it wasn't maxing out the cpu, it must have been running slower. I add some timing. Yikes, the synchronized version is over 10 times slower! (45 seconds with synchronized, 3.8 seconds without.)
Of course, this is close to a worst case scenario since the test doesn't do much more than call the one synchronized method. And although the locking is per node, every access has to start at the root node. Java locks are very fast these days, but only when uncontested. And in this case the nodes at the root of the tree will be heavily contested.
If it was a single variable I could just make it volatile, but it's an array, and putting volatile on an array only makes the reference volatile, not the contents. That's where AtomicReferenceArray comes in. It had crossed my mind a few times, but the interface is quite different from an array so it would be a bit of a hassle to switch. I'd also have to write my own versions of Arrays.copyOf and System.arraycopy
I changed the code to use AtomicReferenceArray. It was a bit slower than the raw array, but not a lot (5 seconds, versus 3.8).
I also did some checking, and the JVM spec does guarantee that updates are atomic (except for long's and double's). So I think I'm safe not doing anything special for concurrent access. That means threads could read stale values. Which means there could potentially be a data race where two threads could end up loading the same node at the same time. But as far as I can figure, this should be benign since they will both get the same node (it's never modified in the memory-mapped file) and therefore it doesn't matter which one wins. I think it's equivalent to how String calculates its hashCode lazily.
The question is, do I trust this reasoning? Or do I go with the safer, but slower and uglier AtomicReferenceArray?
A little searching on the web turns up: Date-Race-Ful Lazy Initialization for Performance. Ironically, this both confirms my reasoning, and also identifies a bug in my code. I was accessing the shared variable more than once in my code, which can lead to problems due to memory access re-ordering. Easily fixed, but yet another example that it's hard to reason correctly about this stuff. (Just to hammer the point home, another article I found, Implementing hashCode, has the same bug with reading the variable multiple times without using volatile.)
The blog post also references a section in Effective Java (2nd ed) by Joshua Bloch so I dig my copy out and review it. It confirms that the single-check idiom is valid, but also recommends using double checked locking with a volatile variable instead. Even for the single check idiom, it recommends using a volatile variable, although it's not strictly necessary.
After all that, I think I'm going to go with the simplest code with the raw array, non-volatile, and unsynchronized. That's not because I trust my reasoning, but because I trust the explanation and examples of people who are much smarter than me.
Monday, September 05, 2011
Concurrency Strikes Again
I thought it would be a relatively quick job to multiple-thread building the indexes when loading a database dump. Three days of thrashing later I more or less had it working.
But it was only about 10% faster so I ripped out my three days worth of work - not a big enough improvement to justify the extra complexity. I'm a bit puzzled by the limited improvement, but the speed is probably dominated by other factors. (e.g. reading and writing the files.)
The story doesn't end there though. Along the way I was seeing some very puzzling debugging output. So after abandoning my initial goal I started to dig into what was going on.
The bad news is that I found some serious concurrency flaws in my transaction handling. The good news is that I fixed them. Obviously, I need more/better tests.
Immutable data structures are great for concurrency, but they can have significant overhead compared to regular mutable ones. So, to try to get the best of both worlds, I have some semi-immutable data structures. The idea is that they are immutable when shared between threads, but mutable when confined to one thread. As always, the devil is in the details.
I found that I had inadvertently ended up sharing when mutable, leading to some ugly bugs. The data structure becomes immutable after you persist it to disk. But I had updated the master copy before saving instead of after. A small mistake with big consequences. To make sure I don't make a similar problem in the future I added asserts in relevant places to ensure that it was immutable when it should be.
Another potential issue was that even when immutable, it's only "effectively" immutable since the data structure is loaded from disk lazily. I had synchronized the updates but not the reads. That might actually be ok in this case, as long as writing a reference (pointer) is atomic. But I decided it was better to synchronize in a few more places than depend on something I wasn't sure about.
But it was only about 10% faster so I ripped out my three days worth of work - not a big enough improvement to justify the extra complexity. I'm a bit puzzled by the limited improvement, but the speed is probably dominated by other factors. (e.g. reading and writing the files.)
The story doesn't end there though. Along the way I was seeing some very puzzling debugging output. So after abandoning my initial goal I started to dig into what was going on.
The bad news is that I found some serious concurrency flaws in my transaction handling. The good news is that I fixed them. Obviously, I need more/better tests.
Immutable data structures are great for concurrency, but they can have significant overhead compared to regular mutable ones. So, to try to get the best of both worlds, I have some semi-immutable data structures. The idea is that they are immutable when shared between threads, but mutable when confined to one thread. As always, the devil is in the details.
I found that I had inadvertently ended up sharing when mutable, leading to some ugly bugs. The data structure becomes immutable after you persist it to disk. But I had updated the master copy before saving instead of after. A small mistake with big consequences. To make sure I don't make a similar problem in the future I added asserts in relevant places to ensure that it was immutable when it should be.
Another potential issue was that even when immutable, it's only "effectively" immutable since the data structure is loaded from disk lazily. I had synchronized the updates but not the reads. That might actually be ok in this case, as long as writing a reference (pointer) is atomic. But I decided it was better to synchronize in a few more places than depend on something I wasn't sure about.
Friday, September 02, 2011
Append-Only Database Performance
I've been working on the append-only storage engine for jSuneido and most recently dumping and loading. I was flipping back and forth between storage engines and it seemed like the new one was faster. I decided to time it and sure enough, with the new append-only storage engine, loading a database was close to twice as fast - nice!
Note: This was not any kind of scientific benchmark. It was a relatively small database (130mb) and only loading. But still, a promising early result.
This is also still single-threaded. I've been itching to try using multiple threads to build indexes. Each index is relatively independent so it should be feasible, but I'm currently using an "exclusive" transaction so it's a little tricky. This should speed up loading even more.
Note: This was not any kind of scientific benchmark. It was a relatively small database (130mb) and only loading. But still, a promising early result.
This is also still single-threaded. I've been itching to try using multiple threads to build indexes. Each index is relatively independent so it should be feasible, but I'm currently using an "exclusive" transaction so it's a little tricky. This should speed up loading even more.
Tuesday, August 30, 2011
Emergent Design & Functional Thinking
Some good articles by Neal Ford (ThoughtWorks) on the IBM developerWorks site:
Emergent Design
Functional Thinking
Mostly Java with a bit of Groovy.
These are in line with much of my thinking these days regarding immutability, coupling, composition, etc.
Emergent Design
Functional Thinking
Mostly Java with a bit of Groovy.
These are in line with much of my thinking these days regarding immutability, coupling, composition, etc.
Saturday, August 27, 2011
Why Amazon Can't Make a Kindle in the USA
Wednesday, August 17, 2011
Benchmark Surprises
We recently ran into a minor bug in Suneido - tracing temporary indexes in database queries would sometimes report queries that didn't actually have temporary indexes.
I found the problem right away. The query optimization looks at multiple different "strategies", some involve temporary indexes and some don't. The flag that indicated whether a temporary index was needed wasn't getting reset between trying different strategies. This flag wasn't used for much, so the problem hadn't been noticed.
The problem was, when I fixed the bug, a bunch of query optimization tests started failing. That surprised me because the flag wasn't used much, and on top of that, the change fixed the flag - why would that break things?
It turned out the one other place where the flag was used was in join optimization. And unintentionally, the code had ended up dependent on the incorrect flag. Ouch. It turned out surprisingly hard to figure out what was going on. The query optimization is complex and because it explores many different strategies (to find the least cost) any debugging output is large and hard to follow.
Eventually I tracked it down to the relative estimated costs of temporary indexes versus look-ups by joins (which was tied indirectly to the flag).
I'm embarrassed to say that I have never really done good benchmarking to check that the "cost" estimates in the code matched the real execution times. Most of the estimates are seat of the pants guesses, adjusted to work reasonably. It's not as bad as it sounds because there's usually one strategy that is much better than the rest, so you don't really need to be exact.
I "fixed" it (so the tests passed) by greatly reducing the cost estimate for look-ups, but I wasn't sure if that just happened to fix it, or if the cost of look-ups was actually a lot lower than I originally thought. (Actually, one test still "failed" but the new strategy looked better than the old one so I "fixed" the test.)
I took a few hours and did some quick benchmarks. The results were surprising. The optimization "cost" takes into account index key size and record size but it turned out these had very little affect on the execution time. The only significant factor (in the tests I ran) was the number of records processed. A temporary index roughly doubled the time. Joins took roughly the sum of the time of reading the two sides separately, the number of look-ups had little affect.
Note: These results are very specific to Suneido, I would not expect them to applicable elsewhere. They are also working with data that fits comfortably in memory so disk access is not a factor. That might seem unrealistic, but that's actually the normal case with our systems.
One good thing about the results is that (I think) it validates part of the database implementation. The code is careful to not copy data - the database file is mapped into memory, and that "raw" memory becomes (within a wrapper object) the in-memory record with no copying or conversion. This is likely the reason that record size (and maybe index key size) had little effect on speed. (Since I assume copying and conversion would take time proportional to record size.)
I ended up refactoring the join optimization to not use the flag at all. That left only the tracing using it, and it was easy to rewrite that to work another way. So I ended up removing the problematic flag entirely. Nice.
I'm left with the realization that my query optimization code is likely a lot more complicated than it needs to be. It looks like it would work just as well if I ignored record size and index key size and just based costs on the estimated number of records to be processed at each stage. But I think I'll leave that for another time.
I found the problem right away. The query optimization looks at multiple different "strategies", some involve temporary indexes and some don't. The flag that indicated whether a temporary index was needed wasn't getting reset between trying different strategies. This flag wasn't used for much, so the problem hadn't been noticed.
The problem was, when I fixed the bug, a bunch of query optimization tests started failing. That surprised me because the flag wasn't used much, and on top of that, the change fixed the flag - why would that break things?
It turned out the one other place where the flag was used was in join optimization. And unintentionally, the code had ended up dependent on the incorrect flag. Ouch. It turned out surprisingly hard to figure out what was going on. The query optimization is complex and because it explores many different strategies (to find the least cost) any debugging output is large and hard to follow.
Eventually I tracked it down to the relative estimated costs of temporary indexes versus look-ups by joins (which was tied indirectly to the flag).
I'm embarrassed to say that I have never really done good benchmarking to check that the "cost" estimates in the code matched the real execution times. Most of the estimates are seat of the pants guesses, adjusted to work reasonably. It's not as bad as it sounds because there's usually one strategy that is much better than the rest, so you don't really need to be exact.
I "fixed" it (so the tests passed) by greatly reducing the cost estimate for look-ups, but I wasn't sure if that just happened to fix it, or if the cost of look-ups was actually a lot lower than I originally thought. (Actually, one test still "failed" but the new strategy looked better than the old one so I "fixed" the test.)
I took a few hours and did some quick benchmarks. The results were surprising. The optimization "cost" takes into account index key size and record size but it turned out these had very little affect on the execution time. The only significant factor (in the tests I ran) was the number of records processed. A temporary index roughly doubled the time. Joins took roughly the sum of the time of reading the two sides separately, the number of look-ups had little affect.
Note: These results are very specific to Suneido, I would not expect them to applicable elsewhere. They are also working with data that fits comfortably in memory so disk access is not a factor. That might seem unrealistic, but that's actually the normal case with our systems.
One good thing about the results is that (I think) it validates part of the database implementation. The code is careful to not copy data - the database file is mapped into memory, and that "raw" memory becomes (within a wrapper object) the in-memory record with no copying or conversion. This is likely the reason that record size (and maybe index key size) had little effect on speed. (Since I assume copying and conversion would take time proportional to record size.)
I ended up refactoring the join optimization to not use the flag at all. That left only the tracing using it, and it was easy to rewrite that to work another way. So I ended up removing the problematic flag entirely. Nice.
I'm left with the realization that my query optimization code is likely a lot more complicated than it needs to be. It looks like it would work just as well if I ignored record size and index key size and just based costs on the estimated number of records to be processed at each stage. But I think I'll leave that for another time.
Sunday, August 07, 2011
Scala Again
Recently I've been re-reading Odersky's Scala book (the new edition). I looked back at my original blog posts about Scala and was surprised to see they were over two years old. Time flies!
I set up a copy of Eclipse (3.7 Indigo) with the latest Scala plugin (2.0.0beta09) so I could play around.
One of the examples in the book (and other places) is using case classes and pattern matching to simplify algebraic equations. I do some of that in Suneido (both in the language and the database query optimization) so it's somewhat familiar. Here are the classes:
I chose two optimizations - subtraction of two constant numbers, and addition of zero
This version works for a single level:
The obvious way to make it handle multiple levels is:
But this is top down, which doesn't handle cases like:
BinOp("+", Var("x"), BinOp("-", Num(2), Num(2)))
We need to operate bottom up, which means processing a node's children before itself:
BinOp("+", Var("x"), BinOp("-", Num(2), Num(2)))
One thing that bugs me about the recursive versions is that they copy every node, even if they're not changing anything. In Suneido I do something like:
But that seemed verbose and didn't fit with this Scala code. Instead, I added "alter" methods to UnOp and BinOp like:
so simplify only had to change slightly:
Of course, I ask myself if this is premature optimization. Obviously, for this simple test, it's unnecessary. On the other hand, it was part of my exploration.
I always struggle with this question of premature optimization. The problem is (as I've written about before) if you totally ignore performance issues, you can easily end up with a program that, for example, does several times as much allocation as it needs to. And that sub-optimal code won't be conveniently in one hot spot - it'll be spread throughout your code. Allocation in modern JVM's is very fast, but garbage collection and memory bandwidth and cache effects still cost.
The more I play with Scala, the more I like it.
I set up a copy of Eclipse (3.7 Indigo) with the latest Scala plugin (2.0.0beta09) so I could play around.
One of the examples in the book (and other places) is using case classes and pattern matching to simplify algebraic equations. I do some of that in Suneido (both in the language and the database query optimization) so it's somewhat familiar. Here are the classes:
I chose two optimizations - subtraction of two constant numbers, and addition of zero
This version works for a single level:
BinOp("-", Num(2), Num(2)) => Num(0)
BinOp("+", Var("x"), Num(0)) => Var("x")
The obvious way to make it handle multiple levels is:
But this is top down, which doesn't handle cases like:
BinOp("+", Var("x"), BinOp("-", Num(2), Num(2)))
=> BinOp("+",Var("x"),Num(0))
We need to operate bottom up, which means processing a node's children before itself:
=> Var("x")
One thing that bugs me about the recursive versions is that they copy every node, even if they're not changing anything. In Suneido I do something like:
arg = simplify(node.arg)
if (arg != node.arg)
node = new UnOp(node.op, arg)
But that seemed verbose and didn't fit with this Scala code. Instead, I added "alter" methods to UnOp and BinOp like:
so simplify only had to change slightly:
Of course, I ask myself if this is premature optimization. Obviously, for this simple test, it's unnecessary. On the other hand, it was part of my exploration.
I always struggle with this question of premature optimization. The problem is (as I've written about before) if you totally ignore performance issues, you can easily end up with a program that, for example, does several times as much allocation as it needs to. And that sub-optimal code won't be conveniently in one hot spot - it'll be spread throughout your code. Allocation in modern JVM's is very fast, but garbage collection and memory bandwidth and cache effects still cost.
The more I play with Scala, the more I like it.
Saturday, July 30, 2011
Append-Only Database Progress
This won't mean much to anyone but me, but it's the culmination of several weeks of refactoring. This test runs a simple query using both the current storage engine and the new append-only storage engine, by simply choosing which DatabasePackage to use. All the query parsing, optimization, and execution is common, just the low level storage engine is different. While the query is simple, a lot of things have to work at the lower levels to make it function, so it's a good milestone.
This is using jUnit's ability to run tests with multiple sets of parameters.
This is using jUnit's ability to run tests with multiple sets of parameters.
Thursday, July 21, 2011
Upgrading to Lion
As much as I know better than to rush to new technology, I couldn't resist upgrading to Lion right away.
I wasn't really paying attention, but it took about an hour to download the 4gb update from the App Store.
I knew I had four machines to update (my iMac and MacBook, and Shelley's iMac and MacBook) so I looked up online how to update multiple machines from a single download.
The trick is that there is a disk image inside the download that you can use to make a bootable DVD or USB drive. But you need to do that after you download, but before you install. One of the better explanations was HOW TO: Do a Clean Install of OS X Lion
Without thinking, I started burning a DVD, only to realize that the MacBook Air's don't have DVD drives. So I grabbed the thumb drive I had handy and tried to use it. But it was only 4gb and the image didn't fit. I thought I'd have to get a bigger thumb drive but then I remembered I had a small USB hard disk that I use for backing up photos when travelling. It was more than big enough.
The install went smoothly. Again, I didn't time it, but it was something like half an hour.
The next morning, when I tried to start Eclipse to do some programming, I got "no JRE installed" and it asked if I wanted to install one. I said yes and after quite a long pause (30 seconds?) it installed one and Eclipse started. Not bad!
I needed to reselect the JRE in Eclipse. Then a test failed because I'd forgotten to enable asserts on that JRE. But after that, all the tests passed. So far so good.
Then I found Mercurial was broken. A quick internet search found Mercurial SCM (hg) fix for OS X 10.7 Lion. Even better, I read to the bottom before I tried the fix and found there was an updated installer for Mercurial on Lion. I downloaded and installed it and Mercurial was back in business.
It did take quite a while to re-index for Spotlight searching but that happened in the background.
iTunes gave me a rather frightening message about not being able to save my music library, but it seems to be working fine.
Those are really the only problems I've had so far. Not too bad, considering.
I updated my MacBook using the USB hard disk. It went smoothly also.
One change that is going to take some getting used to is the reversing of the scroll direction. Admittedly, the old way does seem backwards - pulling your finger down to scroll up. Now it's the same as the iPhone or iPad, with a metaphor of dragging the page around. But old habits are hard to break and I keep trying to scroll the wrong direction. And I still use Windows at work which will be the opposite way, although with a mouse wheel rather than swiping on a Magic Mouse.
I can't imagine Microsoft having the guts to make this kind of change. Nor can I imagine Microsoft users standing for it - they would just flip the setting to "Classic" mode. (The same way most of my staff immediately switched new versions of Windows to the old Classic appearance, despite my protests.) It would be interesting to know how many Mac users switch it back to the old way.
There will probably be other issues I haven't run into yet, but so far it's been a pretty smooth update.
I wasn't really paying attention, but it took about an hour to download the 4gb update from the App Store.
I knew I had four machines to update (my iMac and MacBook, and Shelley's iMac and MacBook) so I looked up online how to update multiple machines from a single download.
The trick is that there is a disk image inside the download that you can use to make a bootable DVD or USB drive. But you need to do that after you download, but before you install. One of the better explanations was HOW TO: Do a Clean Install of OS X Lion
Without thinking, I started burning a DVD, only to realize that the MacBook Air's don't have DVD drives. So I grabbed the thumb drive I had handy and tried to use it. But it was only 4gb and the image didn't fit. I thought I'd have to get a bigger thumb drive but then I remembered I had a small USB hard disk that I use for backing up photos when travelling. It was more than big enough.
The install went smoothly. Again, I didn't time it, but it was something like half an hour.
The next morning, when I tried to start Eclipse to do some programming, I got "no JRE installed" and it asked if I wanted to install one. I said yes and after quite a long pause (30 seconds?) it installed one and Eclipse started. Not bad!
I needed to reselect the JRE in Eclipse. Then a test failed because I'd forgotten to enable asserts on that JRE. But after that, all the tests passed. So far so good.
Then I found Mercurial was broken. A quick internet search found Mercurial SCM (hg) fix for OS X 10.7 Lion. Even better, I read to the bottom before I tried the fix and found there was an updated installer for Mercurial on Lion. I downloaded and installed it and Mercurial was back in business.
It did take quite a while to re-index for Spotlight searching but that happened in the background.
iTunes gave me a rather frightening message about not being able to save my music library, but it seems to be working fine.
Those are really the only problems I've had so far. Not too bad, considering.
I updated my MacBook using the USB hard disk. It went smoothly also.
One change that is going to take some getting used to is the reversing of the scroll direction. Admittedly, the old way does seem backwards - pulling your finger down to scroll up. Now it's the same as the iPhone or iPad, with a metaphor of dragging the page around. But old habits are hard to break and I keep trying to scroll the wrong direction. And I still use Windows at work which will be the opposite way, although with a mouse wheel rather than swiping on a Magic Mouse.
I can't imagine Microsoft having the guts to make this kind of change. Nor can I imagine Microsoft users standing for it - they would just flip the setting to "Classic" mode. (The same way most of my staff immediately switched new versions of Windows to the old Classic appearance, despite my protests.) It would be interesting to know how many Mac users switch it back to the old way.
There will probably be other issues I haven't run into yet, but so far it's been a pretty smooth update.
Wednesday, July 20, 2011
Java BufferedInputStream Problem
I ran into a problem loading a large database dump from cSuneido into jSuneido.
After several hours of debugging, I found if I changed this:
InputStream fin = new BufferedInputStream(
new FileInputStream(filename));
to:
InputStream fin = new FileInputStream(filename);
(i.e. removed the BufferedInputStream)
Then it worked ?!
I searched on the web for BufferedInputStream problems but didn't find anything relevant.
One thing that is suspicious is that this is probably the first file I've tried to load that is larger than 4 gb (i.e. the limit for a 32 bit integer), although I'm not sure why that would be a problem for BufferedInputStream. You'd think it would just be reading sequentially. Unless it's trying to do something clever like memory mapping the whole file.
Note: This is on a 64 bit JVM.
I'm just going to omit the BufferedInputStream for now, but it would be nice to know why there was a problem.
I guess I could go dig up the OpenJDK code for BufferedInputStream but I'm not that motivated.
Anybody have any information or thoughts on this?
After several hours of debugging, I found if I changed this:
InputStream fin = new BufferedInputStream(
new FileInputStream(filename));
to:
InputStream fin = new FileInputStream(filename);
(i.e. removed the BufferedInputStream)
Then it worked ?!
I searched on the web for BufferedInputStream problems but didn't find anything relevant.
One thing that is suspicious is that this is probably the first file I've tried to load that is larger than 4 gb (i.e. the limit for a 32 bit integer), although I'm not sure why that would be a problem for BufferedInputStream. You'd think it would just be reading sequentially. Unless it's trying to do something clever like memory mapping the whole file.
Note: This is on a 64 bit JVM.
I'm just going to omit the BufferedInputStream for now, but it would be nice to know why there was a problem.
I guess I could go dig up the OpenJDK code for BufferedInputStream but I'm not that motivated.
Anybody have any information or thoughts on this?
Tuesday, July 19, 2011
Large Scale Java Design
I ran into a problem when I reached the point where I wanted to start integrating my append-only database storage engine into the rest of Suneido.
The problem was that the old storage engine was too tightly coupled with the rest of Suneido. Not very good design on my part, but in my defense, the design came from the old C++ code, much of which is 10 or 15 years old.
So how should the code be structured? What's the best way to use Java packages and interfaces to reduce coupling? I looked for the Java equivalent of Large Scale C++ Software Design by John Lakos. (Seeing that it was published in 1996, I guess I don't really have an excuse for the poor structure of my old code!) But there doesn't seemed to be an equivalent for Java. (Let me know if you have suggestions.)
There are books like Effective Java but they're mostly about small scale. And there are books about J2EE architecture, but I'm not using J2EE.
Books like Growing Object-Oriented Software (recommended) talk about coupling from the view point of testability, but don't explicitly give guidelines for high level architecture.
In addition, what I needed wasn't just less coupling. I needed to be able to configure the code to run with either storage engine. For that, there couldn't be any static coupling between the storage engine and the rest of the code.
One really crude way to do this would be to simply rename one of the two storage engine packages to be the name referenced by the rest of the code. That would require changing every source file. Eclipse would do that, but it wouldn't play well with version control. And I would have to manually ensure that the two packages were defined the same public methods.
Instead I decided to clean up my architecture. In case it's helpful to anyone else, I'll sketch out what I came up with. Some of it is standard advice, some is a little more specific to what I needed. None of it is particularly original or unique.
First, other packages (i.e. the rest of the system) should only reference interfaces that are defined in a separate interface package.
That means no direct field access, you have to use setters and getters for public access.
It also means no public static methods or constructors. Instead there is an interface for a "package" object. A singleton instance of the package object is created at startup. All access to the package starts here. So to use the new storage engine you just have to use its package object. This class contains any public constructors and static methods and is the only public class in the package.
One mistake I made, coming new to Java from C++, was to make methods either private or public. I should have been using package scope a lot more than public. (Which might explain why it's the default in Java!) Now, the only methods that are public are the ones implementing public interfaces.
It's nice that Java (since version 5 / 1.5) allows covariant return types. i.e. the public interface can say that a method returns another public interface, but the actual method implementation can return the concrete type. This reduces the need for casting within the package.
I've finished refactoring so the old storage engine code follows these guidelines. I thought I might run into areas that would take a lot of redesign, but it's gone surprisingly smoothly.
Next I have to refactor the new append-only storage engine to implement the same interfaces. Then I should be able to easily switch between the two.
Of course, the interface I've extracted from the old storage engine doesn't match the new storage engine so I'll have to do some refactoring to end up with a common interface.
As usual, the code is public in Suneido's Mercurial repository on SourceForge.
The problem was that the old storage engine was too tightly coupled with the rest of Suneido. Not very good design on my part, but in my defense, the design came from the old C++ code, much of which is 10 or 15 years old.
So how should the code be structured? What's the best way to use Java packages and interfaces to reduce coupling? I looked for the Java equivalent of Large Scale C++ Software Design by John Lakos. (Seeing that it was published in 1996, I guess I don't really have an excuse for the poor structure of my old code!) But there doesn't seemed to be an equivalent for Java. (Let me know if you have suggestions.)
There are books like Effective Java but they're mostly about small scale. And there are books about J2EE architecture, but I'm not using J2EE.
Books like Growing Object-Oriented Software (recommended) talk about coupling from the view point of testability, but don't explicitly give guidelines for high level architecture.
In addition, what I needed wasn't just less coupling. I needed to be able to configure the code to run with either storage engine. For that, there couldn't be any static coupling between the storage engine and the rest of the code.
One really crude way to do this would be to simply rename one of the two storage engine packages to be the name referenced by the rest of the code. That would require changing every source file. Eclipse would do that, but it wouldn't play well with version control. And I would have to manually ensure that the two packages were defined the same public methods.
Instead I decided to clean up my architecture. In case it's helpful to anyone else, I'll sketch out what I came up with. Some of it is standard advice, some is a little more specific to what I needed. None of it is particularly original or unique.
First, other packages (i.e. the rest of the system) should only reference interfaces that are defined in a separate interface package.
That means no direct field access, you have to use setters and getters for public access.
It also means no public static methods or constructors. Instead there is an interface for a "package" object. A singleton instance of the package object is created at startup. All access to the package starts here. So to use the new storage engine you just have to use its package object. This class contains any public constructors and static methods and is the only public class in the package.
One mistake I made, coming new to Java from C++, was to make methods either private or public. I should have been using package scope a lot more than public. (Which might explain why it's the default in Java!) Now, the only methods that are public are the ones implementing public interfaces.
It's nice that Java (since version 5 / 1.5) allows covariant return types. i.e. the public interface can say that a method returns another public interface, but the actual method implementation can return the concrete type. This reduces the need for casting within the package.
I've finished refactoring so the old storage engine code follows these guidelines. I thought I might run into areas that would take a lot of redesign, but it's gone surprisingly smoothly.
Next I have to refactor the new append-only storage engine to implement the same interfaces. Then I should be able to easily switch between the two.
Of course, the interface I've extracted from the old storage engine doesn't match the new storage engine so I'll have to do some refactoring to end up with a common interface.
As usual, the code is public in Suneido's Mercurial repository on SourceForge.
Tuesday, July 12, 2011
The LMAX Architecture
A good article by Martin Fowler on a very interesting architecture.
Thursday, July 07, 2011
Dreaded Deadlock
I was doing some refactoring and came across a stress test that I hadn't run for a while. (it's too slow to include in my unit test suite)
I ran it and it hung up. What the ...?
I thought maybe it was my refactoring but I reverted my changes and it still hung.
The test is multithreaded and using the Eclipse debugger I could see the threads were all hung at the some spot. It wasn't an infinite loop, but it was a synchronized method.
I used jConsole's deadlock detector to confirm what was going on.
Suspiciously, I had recently made a "minor" change to this exact part of the code.
The funny part was that I made the change because FindBugs reported a potential concurrency problem.
I removed the synchronized and used volatile instead and that fixed the problem. I think volatile is ok in this instance, but it's always hard to know for sure.
I think FindBugs was correct that there was a problem. My mistake was slapping in a fix without really thinking it through. That's definitely a no-no when it comes to concurrency. It's a good example of how the "intuition" that adding more locks can't hurt is very wrong.
My second mistake was not running the related stress tests when I made the change. I need to add the stress tests to our continuous build system at work. And maybe set up something at home as well, if I can make it lightweight enough.
Yet another reminder that concurrency is hard! (not that I should have needed the reminder)
I ran it and it hung up. What the ...?
I thought maybe it was my refactoring but I reverted my changes and it still hung.
The test is multithreaded and using the Eclipse debugger I could see the threads were all hung at the some spot. It wasn't an infinite loop, but it was a synchronized method.
I used jConsole's deadlock detector to confirm what was going on.
Suspiciously, I had recently made a "minor" change to this exact part of the code.
The funny part was that I made the change because FindBugs reported a potential concurrency problem.
I removed the synchronized and used volatile instead and that fixed the problem. I think volatile is ok in this instance, but it's always hard to know for sure.
I think FindBugs was correct that there was a problem. My mistake was slapping in a fix without really thinking it through. That's definitely a no-no when it comes to concurrency. It's a good example of how the "intuition" that adding more locks can't hurt is very wrong.
My second mistake was not running the related stress tests when I made the change. I need to add the stress tests to our continuous build system at work. And maybe set up something at home as well, if I can make it lightweight enough.
Yet another reminder that concurrency is hard! (not that I should have needed the reminder)
Thursday, June 30, 2011
Mockito for Suneido
From the Couch 12 - Mockito for Suneido
I've been using Mockito for writing tests for jSuneido and I really like it.
So I decided to write something like it for Suneido ...more
I've been using Mockito for writing tests for jSuneido and I really like it.
So I decided to write something like it for Suneido ...more
Saturday, June 25, 2011
Upgrading to Eclipse Indigo
The latest version of the Eclipse IDE, version 3.7 named Indigo, was release a few days ago.
For me, this was the smoothest upgrade so far. My jSuneido project still seems to build just fine, and the tests still all succeed.
All but one of the plugins I use was available through the Eclipse Marketplace (under the Help menu). Going through the Marketplace is a lot easier than the old style of entering a URL for an update site. The plugin that was not available has not been marked as compatible with Indigo, not surprising as it doesn't seem to be under active development. There are other metrics plugins, but the nice thing about this one is that it also displayed dependencies graphs. But it's probably the plugin I use least, so it wasn't a big deal. The other plugins I use (that were available) are: Bytecode Outline (for ASM), EclEmma Code Coverage, FindBugs, and MercurialEclipse.
I haven't really noticed any major improvements, but I'm sure there are some.
The only minor annoyance I noticed was that it spent a long time at first downloading the Maven indexes. Maven support is built in now, but my project doesn't use it, so I'm not sure why it needed to download the indexes. But even this wasn't a big deal since it happened in the background. (I just noticed it in the Progress view and in my network activity.) I tried to use Maven another time but just made a mess and gave up. Maybe I should try again now that support is built in.
If my experience is typical, then I wouldn't be afraid to upgrade.
For me, this was the smoothest upgrade so far. My jSuneido project still seems to build just fine, and the tests still all succeed.
All but one of the plugins I use was available through the Eclipse Marketplace (under the Help menu). Going through the Marketplace is a lot easier than the old style of entering a URL for an update site. The plugin that was not available has not been marked as compatible with Indigo, not surprising as it doesn't seem to be under active development. There are other metrics plugins, but the nice thing about this one is that it also displayed dependencies graphs. But it's probably the plugin I use least, so it wasn't a big deal. The other plugins I use (that were available) are: Bytecode Outline (for ASM), EclEmma Code Coverage, FindBugs, and MercurialEclipse.
I haven't really noticed any major improvements, but I'm sure there are some.
The only minor annoyance I noticed was that it spent a long time at first downloading the Maven indexes. Maven support is built in now, but my project doesn't use it, so I'm not sure why it needed to download the indexes. But even this wasn't a big deal since it happened in the background. (I just noticed it in the Progress view and in my network activity.) I tried to use Maven another time but just made a mess and gave up. Maybe I should try again now that support is built in.
If my experience is typical, then I wouldn't be afraid to upgrade.
Thursday, June 23, 2011
Are ten apps all you need?
Ten apps is all I need - (37signals)
I disagree. I regularly use a bunch of non-Apple apps e.g. offline Wikipedia, iBird, oMaps, Topo Maps, Evernote, MobileRSS, BlogPress, TripIt, Photoshop Express, Yelp, Facebook, Twitter, SmartGo. *
I agree no one needs hundreds of thousands of apps. But you need a large selection in order to find the ones you want - you need the long tail. It's the same problem when you try to reduce the number of features in software. Sure, no one person uses all the features, but different people use different features.
The popularity of the platform draws the developers that you need to get the apps you want.
One of the big things that would stop me from using another phone or tablet (e.g. Android) would be the availability of the apps that I want.
PS. I'd also disagree with listing the Weather app as one that Apple "nailed". It's very basic and I don't find it very useful. I believe it's getting updated in IOS 5.
* A number of these I use because they work offline. Outside Canada, or even out of town, I don't have constant connectivity.
I disagree. I regularly use a bunch of non-Apple apps e.g. offline Wikipedia, iBird, oMaps, Topo Maps, Evernote, MobileRSS, BlogPress, TripIt, Photoshop Express, Yelp, Facebook, Twitter, SmartGo. *
I agree no one needs hundreds of thousands of apps. But you need a large selection in order to find the ones you want - you need the long tail. It's the same problem when you try to reduce the number of features in software. Sure, no one person uses all the features, but different people use different features.
The popularity of the platform draws the developers that you need to get the apps you want.
One of the big things that would stop me from using another phone or tablet (e.g. Android) would be the availability of the apps that I want.
PS. I'd also disagree with listing the Weather app as one that Apple "nailed". It's very basic and I don't find it very useful. I believe it's getting updated in IOS 5.
* A number of these I use because they work offline. Outside Canada, or even out of town, I don't have constant connectivity.
Monday, May 30, 2011
Language-induced brain damage
elliotth's blog: Language-induced brain damage is better than the alternative
An interesting blog post.
And I'm glad I'm not the only one that thinks java.nio.buffers sucks. (Although, in my latest append-only database storage engine they haven't been that big a problem - maybe I've learnt how to use them to avoid the worst problems.)
Wednesday, May 04, 2011
Apps
A couple of interesting free iPhone apps I've come across:
LeafSnap - take a picture of a leaf and get it identified. I haven't been able to test this much since we have no leaves yet! But I love the idea. Only some US trees so far but more promised.
Photosynth - helps you photograph panaramas, stitches them together, and lets you view them. From Microsoft.
LeafSnap - take a picture of a leaf and get it identified. I haven't been able to test this much since we have no leaves yet! But I love the idea. Only some US trees so far but more promised.
Photosynth - helps you photograph panaramas, stitches them together, and lets you view them. From Microsoft.
A part of me would like to escape the Apple closed world, maybe buy an Android tablet instead of an iPad. But I really like the huge variety of apps. Some of them come out on Android as well, but not all of them.
Saturday, April 30, 2011
Suneido's Record Structure Revisited
After my last blog post on Suneido's record data structure, the wasted byte in the header started to bug me. I know where it originally came from, I simply had a C struct like:
struct
{
char type;
short nfields;
...
C/C++ automatically adds a byte of padding after the char so that the short is aligned.
Most of the time the padding was invisible and when I did think about it, I figured if the average record is, for example, 100 bytes, then it's only 1% overhead, no big deal.
After the last blog post I was wondering how I could eliminate that byte of padding. There are only three types, so it only really requires two bits. And the full range of nfields isn't really required, so you could store the type in two bits of nfields, still leaving a maximum number of fields of 16,383.
That actually cuts two bytes, not just one. But still, that's only 2%.
But then I realized that Suneido also uses this record format to store keys within btree nodes. And keys are shorter. If the average key is 10 or 20 bytes, then the overhead is 10 or 20%, not 2%. That's a little more significant.
And it's not just the memory space, it's also garbage collection overhead, and disk reads and writes. And it's also copying overhead when "updating" immutable data records and btree nodes.
One of the reasons I kept the same record format in jSuneido as in cSuneido was so that dumps from cSuneido could be directly loaded into jSuneido without converting the record format. But btrees are not moved from one version to another, so they could easily use a different, more compact format. And it wouldn't be hard to convert the format when loading cSuneido dumps.
One drawback to this approach is that when the offsets are 4 byte ints, then a 2 byte header will mean the offsets are not aligned. This might be more of a concern in cSuneido, where records are used more heavily and are accessed more directly. But I don't think it's a big issue in jSuneido.
It wasn't too hard to implement. With one table of about 1600 records that was 1 mb in size (including indexes) the database was about 5% smaller after the changes. Not bad for a few hours work and very little added complexity.
struct
{
char type;
short nfields;
...
C/C++ automatically adds a byte of padding after the char so that the short is aligned.
Most of the time the padding was invisible and when I did think about it, I figured if the average record is, for example, 100 bytes, then it's only 1% overhead, no big deal.
After the last blog post I was wondering how I could eliminate that byte of padding. There are only three types, so it only really requires two bits. And the full range of nfields isn't really required, so you could store the type in two bits of nfields, still leaving a maximum number of fields of 16,383.
That actually cuts two bytes, not just one. But still, that's only 2%.
But then I realized that Suneido also uses this record format to store keys within btree nodes. And keys are shorter. If the average key is 10 or 20 bytes, then the overhead is 10 or 20%, not 2%. That's a little more significant.
And it's not just the memory space, it's also garbage collection overhead, and disk reads and writes. And it's also copying overhead when "updating" immutable data records and btree nodes.
One of the reasons I kept the same record format in jSuneido as in cSuneido was so that dumps from cSuneido could be directly loaded into jSuneido without converting the record format. But btrees are not moved from one version to another, so they could easily use a different, more compact format. And it wouldn't be hard to convert the format when loading cSuneido dumps.
One drawback to this approach is that when the offsets are 4 byte ints, then a 2 byte header will mean the offsets are not aligned. This might be more of a concern in cSuneido, where records are used more heavily and are accessed more directly. But I don't think it's a big issue in jSuneido.
It wasn't too hard to implement. With one table of about 1600 records that was 1 mb in size (including indexes) the database was about 5% smaller after the changes. Not bad for a few hours work and very little added complexity.
Thursday, April 28, 2011
Suneido's Record Data Structure
Both cSuneido and jSuneido use the same data structure for low level record storage.
type is one of 'c' (char, 1 byte), 's' (short, 2 bytes), or 'l' (long, 4 bytes) and specifies the size of the length and offset values. (In Java, a better choice of types would have been 'b' (byte), 's' (short), and 'i' (int)). (Calling a 4 byte integer "long" is a leftover from the days when int's were 2 bytes.)
There is one byte of padding after the type byte. (Originally a result of C struct padding.) Or the type can be treated as a 2 byte little endian short integer.
nfields is a short (2 byte) integer count of the number of fields in the record.
length is the overall number of bytes in the record. When calculating field sizes as the difference between two offsets, it is handy to be able to reference the length as offset[-1].
The fields data is variable length and stored in reverse order.
In cSuneido records are used as mutable data structures. As fields are added, the offsets and fields grow towards each other. Before storing in the database file they are copied to the exact size required. As a mutable data structure, one drawback of the design is that when the record is reallocated to a larger size, all the offsets change. If the field data was at the beginning and the offsets at the end, then reallocating would not change the offsets and they could simply be copied to the new record. The tradeoff would be that accessing the offsets would be more awkward since they would not start at a fixed location in the record.
In jSuneido records are stored in NIO ByteBuffer's. They are only used as immutable data structures (at least in the new storage engine). A RecordBuilder is used to collect the data so that the record can then be created the exact size required. (i.e. with no unused space between the offsets and the fields)
Allowing different sizes for the length and offsets makes the overhead low for small records (an empty record is only 5 bytes) while still allowing for large records (up to 32k fields and 2gb length).
Having an array of offsets allows random access to the fields. For example, a search on the Nth field can access that field directly, without having to scan through the record. Of course, this is only an advantage because Suneido does not convert records to an internal format to search them.
Offsets are used rather than pointers because records are intended to be stored in the database where pointers would not be usable.
In this design empty fields still take up space in the offsets array although not in the field data. (e.g. in the common short type, empty fields take 2 bytes each) This seems like a reasonable tradeoff for the ability to randomly access fields.
The C++ source code is in record.h and record.cpp and the Java code is in Record.java
type is one of 'c' (char, 1 byte), 's' (short, 2 bytes), or 'l' (long, 4 bytes) and specifies the size of the length and offset values. (In Java, a better choice of types would have been 'b' (byte), 's' (short), and 'i' (int)). (Calling a 4 byte integer "long" is a leftover from the days when int's were 2 bytes.)
There is one byte of padding after the type byte. (Originally a result of C struct padding.) Or the type can be treated as a 2 byte little endian short integer.
nfields is a short (2 byte) integer count of the number of fields in the record.
length is the overall number of bytes in the record. When calculating field sizes as the difference between two offsets, it is handy to be able to reference the length as offset[-1].
The fields data is variable length and stored in reverse order.
In cSuneido records are used as mutable data structures. As fields are added, the offsets and fields grow towards each other. Before storing in the database file they are copied to the exact size required. As a mutable data structure, one drawback of the design is that when the record is reallocated to a larger size, all the offsets change. If the field data was at the beginning and the offsets at the end, then reallocating would not change the offsets and they could simply be copied to the new record. The tradeoff would be that accessing the offsets would be more awkward since they would not start at a fixed location in the record.
In jSuneido records are stored in NIO ByteBuffer's. They are only used as immutable data structures (at least in the new storage engine). A RecordBuilder is used to collect the data so that the record can then be created the exact size required. (i.e. with no unused space between the offsets and the fields)
Allowing different sizes for the length and offsets makes the overhead low for small records (an empty record is only 5 bytes) while still allowing for large records (up to 32k fields and 2gb length).
Having an array of offsets allows random access to the fields. For example, a search on the Nth field can access that field directly, without having to scan through the record. Of course, this is only an advantage because Suneido does not convert records to an internal format to search them.
Offsets are used rather than pointers because records are intended to be stored in the database where pointers would not be usable.
In this design empty fields still take up space in the offsets array although not in the field data. (e.g. in the common short type, empty fields take 2 bytes each) This seems like a reasonable tradeoff for the ability to randomly access fields.
The C++ source code is in record.h and record.cpp and the Java code is in Record.java
Tuesday, April 26, 2011
Semi-Immutable Data Structures
In the new append-only database storage engine for jSuneido I've ended up with several data structures that are "semi-immutable" i.e. they are immutable some of the time and mutable some of the time.
(These are in-memory data structures, the database file itself is only ever appended to, existing data is always immutable i.e. never updated.)
I've flip flopped back and forth a few times from completely immutable to partially immutable, but I seemed to have settled on partially immutable.
The overall "state" of the database is kept immutably. When a database transaction starts, it "copies" this state. (The "copy" consisting of just a few pointers to persistent immutable data structures.)
"Inside" a transaction, the data structures can become mutable. The first update a transaction does will path copy nodes up to the root. This "disconnects" the data structure from the overall databases state. Nodes within the data structure can now be either shared and immutable or not shared and mutable.
(Allowing mutability within transactions is "safe" because transactions are thread-contained i.e. only one thread at a time works on a transaction.)
The main advantage of this approach is performance since nodes can be updated with less copying.
At the end of a transaction commit, the new state is stored (persisted) in the database file. This converts all the nodes back to immutable.
This semi-immutable approach is used both for the table index btrees and the hash tries used to store redirects and database info (e.g. btree roots).
(These are in-memory data structures, the database file itself is only ever appended to, existing data is always immutable i.e. never updated.)
I've flip flopped back and forth a few times from completely immutable to partially immutable, but I seemed to have settled on partially immutable.
The overall "state" of the database is kept immutably. When a database transaction starts, it "copies" this state. (The "copy" consisting of just a few pointers to persistent immutable data structures.)
"Inside" a transaction, the data structures can become mutable. The first update a transaction does will path copy nodes up to the root. This "disconnects" the data structure from the overall databases state. Nodes within the data structure can now be either shared and immutable or not shared and mutable.
(Allowing mutability within transactions is "safe" because transactions are thread-contained i.e. only one thread at a time works on a transaction.)
The main advantage of this approach is performance since nodes can be updated with less copying.
At the end of a transaction commit, the new state is stored (persisted) in the database file. This converts all the nodes back to immutable.
This semi-immutable approach is used both for the table index btrees and the hash tries used to store redirects and database info (e.g. btree roots).
Sunday, April 24, 2011
jSuneido append-only database fomat
The following is as much for my own review as anything.
The database file layout is quite simple - it consists of a sequence of commits.
Each commit has a header (size and timestamp) and a trailer (checksum and size). Given the leading and trailing sizes, the commits can be traversed forwards and backwards.
The body of a commit consists of the data records (table rows), btree nodes, and hash trie nodes followed by pointers to the roots and redirects hash tries.
The roots hash trie points to the btree indexes as well as other database information. As I've talked about before, the redirects are used to avoid path copying in the btrees.
When the database is opened, the roots and redirects are accessed at a fixed offset from the end of the file. Together they provide access to the database at a certain point in time.
Read-only transactions, operating as-of a certain point in time simply use the roots and redirects as of when they started. They can operate completely independently because the data they are looking at is immutable. There are no locks or concurrency issues.
When the database is opened, the checksums of the last few commits are verified to confirm that the database was previously closed properly. (This uses the trailing size to traverse the commits from the end of the file backwards.)
If this checking fails, then crash recovery is done. The commits are traversed from the beginning of the file (using the leading sizes) and the checksums are verified. The file is then truncated after the last good commit. (This is much simpler and faster than the current system.)
Checksums are of the raw data - they do not care about the logical contents of the commits. They use Adler32 checksums for speed.
Note: Although it's not shown in the diagram, there may be "padding" between elements in the file. This is because the memory mapped file is mapped and extended in "chunks" of 64 mb. Elements are not allowed to straddle chunks, so padding may be required at the end of a chunk if an element won't fit. Since there is no sequential access to elements within commits, this padding has no effect, other than a very small amount of wasted space.
All of this is implemented and working well. But I've still got a fair bit of work left to integrate this new storage engine into the rest of the database code. Sadly, I don't have a nice clean storage engine interface :-(
As usual, the code is available in the Suneido project Mercurial repository on SourceForge. The new storage engine code is in the immudb package.
Thursday, April 21, 2011
NativeJ Java Launcher
If you're looking for a way to run Java programs as Windows services, or launch them from an exe, I would highly recommend Dobysoft's NativeJ
The software works well, but more importantly, they have bent over backwards to accomodate our needs. Granted we paid for it, but $150 doesn't cover a lot of support, let alone programming.
Previously we used JSL to run as a service and Launch4J for a launcher. Both are free and worked ok, but we ran into problems with Launch4J not returning the exit code. (a known issue) It's open source so I guess we could have dug into it and tried to fix it, but I was looking for something that would handle both services and launcher together and NativeJ looks like a good solution.
The software works well, but more importantly, they have bent over backwards to accomodate our needs. Granted we paid for it, but $150 doesn't cover a lot of support, let alone programming.
Previously we used JSL to run as a service and Launch4J for a launcher. Both are free and worked ok, but we ran into problems with Launch4J not returning the exit code. (a known issue) It's open source so I guess we could have dug into it and tried to fix it, but I was looking for something that would handle both services and launcher together and NativeJ looks like a good solution.
Wednesday, April 20, 2011
PDP-11 Emulator running V6 Unix - in Javascript!
http://pdp11.aiju.de/
Amazing!
I was in high school when I got my first Unix account on a computer science system at the university. It might have been a PDP-11. I remember struggling to teach myself C from Kernighan and Ritchie when all I knew was Basic. And of course way too shy to actually ask anyone for help.
Amazing!
I was in high school when I got my first Unix account on a computer science system at the university. It might have been a PDP-11. I remember struggling to teach myself C from Kernighan and Ritchie when all I knew was Basic. And of course way too shy to actually ask anyone for help.
Sunday, April 17, 2011
A Fractal Tree Implementation
I couldn't resist whipping up a quick implementation of a fractal tree, even though I don't really need it for anything. I'm a sucker for a good algorithm :-)
The code can be amazingly simple. Here's my first cut at insertion:
This works fine, but it will chew through a lot of memory allocating new arrays. The space will be reclaimed by the garbage collector, but it seems a little wasteful.
So I modified it to pre-allocate the first few levels of small arrays since these are the ones that get the heavy activity. (e.g. the first four levels of size 1, 2, 4, and 8 will handle 15/16 of adds) This complicates the code a bit, but not too badly.
Iteration is quite simple as well - just merging the sorted nodes.
The full code is in SourceForge
The code can be amazingly simple. Here's my first cut at insertion:
public void add(T x) { Object[] tmp = new Object[] { x }; int i = 0; for (; nodes[i] != null; ++i) { tmp = merge(nodes[i], tmp); nodes[i] = null; } nodes[i] = tmp; }merge simply merges two sorted arrays into a new larger array.
This works fine, but it will chew through a lot of memory allocating new arrays. The space will be reclaimed by the garbage collector, but it seems a little wasteful.
So I modified it to pre-allocate the first few levels of small arrays since these are the ones that get the heavy activity. (e.g. the first four levels of size 1, 2, 4, and 8 will handle 15/16 of adds) This complicates the code a bit, but not too badly.
Iteration is quite simple as well - just merging the sorted nodes.
The full code is in SourceForge
Friday, April 15, 2011
Fractal Trees
Another interesting data structure is the Fractal Tree used by Tokutek's TokuDB (another MySQL variant). I think this is related to the stratified data structure (at least some of the description is similar).
B-trees are at one point in the range of insert speed versus lookup speed. Fractal trees are much faster at inserts, but slower at queries.
It's easy to see how inserts work, and that most of them would be very fast. But part of the reason it's fast is that the cost is amortized - there are occasional big operations (a little like having to resize a hash table). It sounds like they're doing these in a background thread to avoid delays, but that has to make querying tricky.
And querying seems tricky regardless. A naive binary search in each level is going to be slow (and potentially many disk accesses or at least cache misses). To speed this up they add pointers from each value to it's position in the next level. They don't go into detail, but I have to think maintaining these pointers is going to make inserts more complex.
I'm not sure how this approach would work for an append-only index. Again, the average insert wouldn't write much, but the occasional insert that caused a lot of merging would write a lot. Maybe the amortized cost would still be reasonable.
It seems like it would make an interesting sort algorithm, but I have no idea how the performance would compare. It's a little like a merge sort, which has good performance.
Doesn't look like this is something I can use directly, but it's always good to learn new approaches.
B-trees are at one point in the range of insert speed versus lookup speed. Fractal trees are much faster at inserts, but slower at queries.
It's easy to see how inserts work, and that most of them would be very fast. But part of the reason it's fast is that the cost is amortized - there are occasional big operations (a little like having to resize a hash table). It sounds like they're doing these in a background thread to avoid delays, but that has to make querying tricky.
And querying seems tricky regardless. A naive binary search in each level is going to be slow (and potentially many disk accesses or at least cache misses). To speed this up they add pointers from each value to it's position in the next level. They don't go into detail, but I have to think maintaining these pointers is going to make inserts more complex.
I'm not sure how this approach would work for an append-only index. Again, the average insert wouldn't write much, but the occasional insert that caused a lot of merging would write a lot. Maybe the amortized cost would still be reasonable.
It seems like it would make an interesting sort algorithm, but I have no idea how the performance would compare. It's a little like a merge sort, which has good performance.
Doesn't look like this is something I can use directly, but it's always good to learn new approaches.
Thursday, April 14, 2011
B-tree Arcana
I recently ran across a new article on Stratified B-trees and versioning dictionaries. I have to admit I didn't really follow their data structure or algorithm. I looked at some of the references, for example to Cache-Oblivious Streaming B-trees but the ideas seem too complicated to be directly useful to me.
However, there were some bits that were interesting to me. They talked about the space blowups and inefficient caching in path copying CoW (copy-on-write) B-trees.
I'm hoping to avoid this using redirection (based on ideas from RethinkDB). By redirecting instead of path copying, you only need to copy the node you're updating, not all the nodes on the path to the root. (And the nodes can be orders of magnitude smaller - see below.)
Many systems use indirect access through a mapping table all the time. Instead, I use direct access whenever possible, and only indirect to updated data. That feels like it should be better, but without a lot of analysis it's hard to be sure. It does reduce space overhead somewhat.
One of the big issues with CoW B-trees is garbage collecting the old nodes. I have to admit I punt on this, requiring occasional database compaction instead. (This also removes all the redirection.) Not so good if you want to run 24x7 but it's a tricky problem and for our customers occasional maintenance downtime isn't a big deal.
Another interesting point was on SSD's (solid state drives).
Another point that caught my eye was that their arrays doubled in size between levels. Not understanding their system, I'm not completely sure what this means. But it rang a bell because I've been considering doing something similar with my B-tree nodes. Often, B-trees use large node sizes to reduce the number of random disk IO's. But large nodes don't work well with copy-on-write. And random IO's aren't as bad with SSD or just large disk caches.
I still need to do some testing to determine the best size, but I think it will be quite small. But I wonder if it would make sense to make nodes larger as you go "up" the tree towards the root. The leaf nodes are updated most often so it makes sense to make them small. But as you go up the tree the nodes are read more and updated less. Increasing (e.g. doubling) the size of the nodes at each level would decrease the number of tree levels and reduce the number of random IO's. And since larger nodes would split less often, higher tree levels would get updated even less frequently.
If I was in the academic world I'd need to figure out the theoretical performance of my approach under DAM (direct access machine) or CO (cache oblivious) models. Instead, I'll do some quick and dirty benchmarks and if there's not a big performance difference go with the simplest code.
However, there were some bits that were interesting to me. They talked about the space blowups and inefficient caching in path copying CoW (copy-on-write) B-trees.
each update may cause an entire new path to be written – to update a 16-byte key/value pair in a tree of depth 3 with 256K block size, one must first do 3x256K random reads and then write 768K of data (see http://bit.ly/gL7AQ1 which reported 2GB of log data per 50MB of index data)
I'm hoping to avoid this using redirection (based on ideas from RethinkDB). By redirecting instead of path copying, you only need to copy the node you're updating, not all the nodes on the path to the root. (And the nodes can be orders of magnitude smaller - see below.)
Many systems use indirect access through a mapping table all the time. Instead, I use direct access whenever possible, and only indirect to updated data. That feels like it should be better, but without a lot of analysis it's hard to be sure. It does reduce space overhead somewhat.
One of the big issues with CoW B-trees is garbage collecting the old nodes. I have to admit I punt on this, requiring occasional database compaction instead. (This also removes all the redirection.) Not so good if you want to run 24x7 but it's a tricky problem and for our customers occasional maintenance downtime isn't a big deal.
Another interesting point was on SSD's (solid state drives).
SSDs handle small random reads very effectively, but perform poorly at random writes (the recent Intel X25M can perform 35,000 4KB random reads/s, but an order of magnitude fewer writes). However, they can perform very efficiently for large sequential writes.That fits really well with CoW append-only B-trees since they do exactly that - random reads but sequential writes. And I recently got a new computer at work with an SSD :-) so I'll be able to do some testing.
Another point that caught my eye was that their arrays doubled in size between levels. Not understanding their system, I'm not completely sure what this means. But it rang a bell because I've been considering doing something similar with my B-tree nodes. Often, B-trees use large node sizes to reduce the number of random disk IO's. But large nodes don't work well with copy-on-write. And random IO's aren't as bad with SSD or just large disk caches.
I still need to do some testing to determine the best size, but I think it will be quite small. But I wonder if it would make sense to make nodes larger as you go "up" the tree towards the root. The leaf nodes are updated most often so it makes sense to make them small. But as you go up the tree the nodes are read more and updated less. Increasing (e.g. doubling) the size of the nodes at each level would decrease the number of tree levels and reduce the number of random IO's. And since larger nodes would split less often, higher tree levels would get updated even less frequently.
If I was in the academic world I'd need to figure out the theoretical performance of my approach under DAM (direct access machine) or CO (cache oblivious) models. Instead, I'll do some quick and dirty benchmarks and if there's not a big performance difference go with the simplest code.
Wednesday, April 13, 2011
Dead End
I've been plugging away at the new append-only storage engine for jSuneido. I have a decent proof of concept, but it's still a long way from production.
Recently I was working on btree iteration. Usually btree's link the leaf nodes to simplify iteration. But this linking is problematic with an append-only btree. So the iteration has to work a little differently. One of the problems is how to handle changes made during the iteration. Being in an immutable mindset, my immediate thought was to iterate through a "snapshot" of the btree. (Since snapshots are just a pointer in immutable persistent data structures.)
But then I realized I was only immutable on disk. In memory, within a transaction, I was using mutable data structures. So I jumped in and converted my btree's and the redirection hash trie map to fully immutable. After I'd pretty much finished this, I realized I should have looked closer before I leaped. But when you have a hammer, everything looks like a nail :-)
Iterating through a snapshot won't work. For example, if you update a record during iteration, the iteration would see an out of date version of the record, and might even try to update this "stale" data. Back to the drawing board.
On the positive side, I improved the code quite a bit. I've lost track of how many times I've refactored, reshuffled, rewritten this new storage engine code. It's not that big - currently about 2500 lines. You wouldn't think there'd be that many ways to arrange it!
Since I ended up not needing full immutability I changed it back to being immutable in memory (better performance). But I couldn't just revert the code because I wanted to keep some of the improvements I'd made.
About three days work on a dead end path. Oh well, if you're trying to come up with something new you have to expect a few dead ends. Not the first time, and I'm sure it won't be the last.
Recently I was working on btree iteration. Usually btree's link the leaf nodes to simplify iteration. But this linking is problematic with an append-only btree. So the iteration has to work a little differently. One of the problems is how to handle changes made during the iteration. Being in an immutable mindset, my immediate thought was to iterate through a "snapshot" of the btree. (Since snapshots are just a pointer in immutable persistent data structures.)
But then I realized I was only immutable on disk. In memory, within a transaction, I was using mutable data structures. So I jumped in and converted my btree's and the redirection hash trie map to fully immutable. After I'd pretty much finished this, I realized I should have looked closer before I leaped. But when you have a hammer, everything looks like a nail :-)
Iterating through a snapshot won't work. For example, if you update a record during iteration, the iteration would see an out of date version of the record, and might even try to update this "stale" data. Back to the drawing board.
On the positive side, I improved the code quite a bit. I've lost track of how many times I've refactored, reshuffled, rewritten this new storage engine code. It's not that big - currently about 2500 lines. You wouldn't think there'd be that many ways to arrange it!
Since I ended up not needing full immutability I changed it back to being immutable in memory (better performance). But I couldn't just revert the code because I wanted to keep some of the improvements I'd made.
About three days work on a dead end path. Oh well, if you're trying to come up with something new you have to expect a few dead ends. Not the first time, and I'm sure it won't be the last.
Wednesday, March 30, 2011
Interview with Uncle Bob
I enjoyed this podcast of an interview with Robert Martin aka Uncle Bob.
It's interesting to hear how he still likes to spend his time programming after 40 years (as opposed to getting sucked into management or other activities).
It's interesting to hear how he still likes to spend his time programming after 40 years (as opposed to getting sucked into management or other activities).
Sunday, March 27, 2011
More on Java assert
Java's assert is a mixed blessing.
It's nice syntactic sugar. I'd rather write:
assert x == 0 : "expected x = 0, got " + x;
than:
if (x != 0)
throw new AssertionError("expected x = 0, got " + x);
You could make your own assert so you could write:
verify(x == 0, "expected x = 0, got " + x);
But there is are subtle but important differences. With your own function, the error message is always constructed, whether the error is thrown or not. Whereas with assert, as with writing out the if, the error message will only be constructed if the assert fails. That doesn't matter if the message is just a literal string (or there's no message) but if the message requires, for example, calling toString on a complex object you could take a significant performance hit. There's also the lesser overhead of calling a function. It's possible that the JVM JIT optimizer will in-line the function so it becomes equivalent to writing out the "if" and therefore eliminate the overhead, but I'm not sure if it's "safe" to count on this.
The downside of assert is that you have to remember to enable it. (My blog post on this is one of my most visited. Google for "enable java assert" and it's currently the third result.) One way to avoid this is to write a test to confirm that asserts are enabled. I did this after I got burnt by disabled assertions a second time.
I just ran into another problem with this. We were looking at using NativeJ but it doesn't appear to have any way to pass the -ea option since it uses the java dll rather than the actual java executable.
I discovered that there is a way to enable assertions from within the code:
ClassLoader.getSystemClassLoader()
.setPackageAssertionStatus("suneido", true);
It only applies to classes loaded after this, but I have it as the first statement of my main method which should be the first class loaded.
Another wrinkle to assert is that the exception it throws is AssertionError which is an Error. The Java exception hierarchy is:
Throwable
Error
Exception
RuntimeException
Only RuntimeException's can be thrown without declaring them. "Normal" code is only supposed to catch Exception, not Error. But in Suneido, I don't want an assertion failure to "crash" the system, I want to be able to catch them. But to do that, I have to catch Throwable, which is not the recommended practice. I could use two catch's - one for Exception and one for AssertionError. But apart from the extra code, I wonder if there aren't other Error's that I would want to catch.
One way around the AssertionError issue would again be to write my own assert function so I could throw whatever exception I wanted (e.g. a RuntimeException). But as I explained above, there are issues with that.
Yet another option is to use a javac (the Java compiler) annotation processor (like fa.jar) that converts assert to if-throw. But that seems a little scary.
Of course, a lot of this is due to wanting to keep the assertions enabled, even in production. I'm a believer in this since so many problems don't show up till the software gets used for real. However, it does mean you have to be a little careful to make the asserts fast enough for production.
It's nice syntactic sugar. I'd rather write:
assert x == 0 : "expected x = 0, got " + x;
than:
if (x != 0)
throw new AssertionError("expected x = 0, got " + x);
You could make your own assert so you could write:
verify(x == 0, "expected x = 0, got " + x);
But there is are subtle but important differences. With your own function, the error message is always constructed, whether the error is thrown or not. Whereas with assert, as with writing out the if, the error message will only be constructed if the assert fails. That doesn't matter if the message is just a literal string (or there's no message) but if the message requires, for example, calling toString on a complex object you could take a significant performance hit. There's also the lesser overhead of calling a function. It's possible that the JVM JIT optimizer will in-line the function so it becomes equivalent to writing out the "if" and therefore eliminate the overhead, but I'm not sure if it's "safe" to count on this.
The downside of assert is that you have to remember to enable it. (My blog post on this is one of my most visited. Google for "enable java assert" and it's currently the third result.) One way to avoid this is to write a test to confirm that asserts are enabled. I did this after I got burnt by disabled assertions a second time.
I just ran into another problem with this. We were looking at using NativeJ but it doesn't appear to have any way to pass the -ea option since it uses the java dll rather than the actual java executable.
I discovered that there is a way to enable assertions from within the code:
ClassLoader.getSystemClassLoader()
.setPackageAssertionStatus("suneido", true);
It only applies to classes loaded after this, but I have it as the first statement of my main method which should be the first class loaded.
Another wrinkle to assert is that the exception it throws is AssertionError which is an Error. The Java exception hierarchy is:
Throwable
Error
Exception
RuntimeException
Only RuntimeException's can be thrown without declaring them. "Normal" code is only supposed to catch Exception, not Error. But in Suneido, I don't want an assertion failure to "crash" the system, I want to be able to catch them. But to do that, I have to catch Throwable, which is not the recommended practice. I could use two catch's - one for Exception and one for AssertionError. But apart from the extra code, I wonder if there aren't other Error's that I would want to catch.
One way around the AssertionError issue would again be to write my own assert function so I could throw whatever exception I wanted (e.g. a RuntimeException). But as I explained above, there are issues with that.
Yet another option is to use a javac (the Java compiler) annotation processor (like fa.jar) that converts assert to if-throw. But that seems a little scary.
Of course, a lot of this is due to wanting to keep the assertions enabled, even in production. I'm a believer in this since so many problems don't show up till the software gets used for real. However, it does mean you have to be a little careful to make the asserts fast enough for production.
The People Behind The Software
I like how this video is about the people and the emotions behind the software.
Sunday, March 20, 2011
A JVM Does That?
Cliff Click's recent blog post led me to the slides for his talk "A JVM Does That" - interesting stuff.
A quote from his post:
A quote from his post:
Java appears to have plateaued (although Projects Lambda and Coin may give it a serious boost), but the JVM is on a roll. Where people used to target C as their “assembly” of choice years ago (to avoid having to do code-generation), they now target the JVM – which avoids the code-gen AND supplies a GC, AND a well-understood threading model, plus true type-safety.
Wednesday, March 16, 2011
jSuneido Goes Live
Almost three years after starting the project, the Java version of Suneido is now live in production. Last night one of my programmers switched our in-house accounting and customer support system over to jSuneido (thanks Kim). We have about 40 fairly heavy users so it's a good test.
We had a few minor problems today but for the most part it went quite smoothly. When I brought it up with several people they said they didn't notice anything different, which is what I want to hear. Several people thought it was faster. My own impression was mostly that it felt "smoother" - no annoying pauses like we've been running into lately.
(Honestly, I'm amazed that cSuneido manages to handle as many users as it does. When I designed it I never thought it would. It's definitely benefited from Moore's law.)
Our current server is only a dual core machine with 6 gb of memory. It doesn't really have enough cores or memory for jSuneido to take advantage of. We're in the process of getting a new i7 server with 8 cores (counting hyperthreading) and 24 gb of memory (thanks Dorlan). That should be a better platform for jSuneido.
One of the main goals of jSuneido was to scale better to handle larger systems with more users. For smaller systems cSuneido uses less resources. But it can't take advantage of 64 bit systems with large amounts of memory and multiple cores. jSuneido can.
Three years of work to reach this point. Not full time, but close to half my time. That's a lot of hours. It feels good to reach this point. No doubt there will be issues to deal with, but that's to be expected. In many ways it's just a transition between phases of the project. With climbing, reaching the top is only half the battle, the other half is getting down. With software, writing it is only half the battle, the other half is running, fixing, maintaining, and improving it.
This kind of project definitely requires being able to handle deferred gratification. But just as important is to take pleasure in the work itself, as well as in the end result. If I didn't enjoy the day to day programming (most days anyway!) there's no way I would do it for the relatively small amount of final gratification. It's not like anyone really understands what you've done or what it took to get there. The external rewards just aren't enough to justify something like this. The process itself has to be sufficient reward. Any pats on the back at the end are just icing on the cake.
I'm deliberately making a point of marking this milestone. It's good to celebrate small victories along the way. Too often they slide by and with no punctuation it can seem like just one long slog. It must have been an even bigger milestone when the original version of Suneido first went into production, but I can't even remember it. I remember thinking that I would reward myself with one of the brand new flat screen monitors (that dates it!). But I never did, because it never seemed like I was "finished" - there was always as much road ahead of me as there was behind. Here's to the road ahead.
We had a few minor problems today but for the most part it went quite smoothly. When I brought it up with several people they said they didn't notice anything different, which is what I want to hear. Several people thought it was faster. My own impression was mostly that it felt "smoother" - no annoying pauses like we've been running into lately.
(Honestly, I'm amazed that cSuneido manages to handle as many users as it does. When I designed it I never thought it would. It's definitely benefited from Moore's law.)
Our current server is only a dual core machine with 6 gb of memory. It doesn't really have enough cores or memory for jSuneido to take advantage of. We're in the process of getting a new i7 server with 8 cores (counting hyperthreading) and 24 gb of memory (thanks Dorlan). That should be a better platform for jSuneido.
One of the main goals of jSuneido was to scale better to handle larger systems with more users. For smaller systems cSuneido uses less resources. But it can't take advantage of 64 bit systems with large amounts of memory and multiple cores. jSuneido can.
Three years of work to reach this point. Not full time, but close to half my time. That's a lot of hours. It feels good to reach this point. No doubt there will be issues to deal with, but that's to be expected. In many ways it's just a transition between phases of the project. With climbing, reaching the top is only half the battle, the other half is getting down. With software, writing it is only half the battle, the other half is running, fixing, maintaining, and improving it.
This kind of project definitely requires being able to handle deferred gratification. But just as important is to take pleasure in the work itself, as well as in the end result. If I didn't enjoy the day to day programming (most days anyway!) there's no way I would do it for the relatively small amount of final gratification. It's not like anyone really understands what you've done or what it took to get there. The external rewards just aren't enough to justify something like this. The process itself has to be sufficient reward. Any pats on the back at the end are just icing on the cake.
I'm deliberately making a point of marking this milestone. It's good to celebrate small victories along the way. Too often they slide by and with no punctuation it can seem like just one long slog. It must have been an even bigger milestone when the original version of Suneido first went into production, but I can't even remember it. I remember thinking that I would reward myself with one of the brand new flat screen monitors (that dates it!). But I never did, because it never seemed like I was "finished" - there was always as much road ahead of me as there was behind. Here's to the road ahead.
Wednesday, March 09, 2011
ThreadPoolExecutor
I've been working with ThreadPoolExecutor and it's a little confusing so I thought I'd share my experience.
Some background - Executor is a simple interface:
Without a queue and with a maximum number of threads, it's possible that tasks will need to be rejected. By specifying the CallerRunsPolicy rejected tasks will be executed by the calling thread. This is a simple feedback mechanism that will throttle requests since the caller won't be able to accept additional requests while it's busy executing a task.
I chose MAX_THREADS of 8 to be a bit bigger than the expected number of cpu cores (to allow for some threads being blocked). In our actual usage one or two threads seem to be sufficient.
There is a standard Executors.newCachedThreadPool but the documentation isn't specific about what its configuration is. From other sources it appears quite similar to my configuration - zero core threads and a SynchronousQueue. However, it has unlimited MAX_THREADS instead of CallerRunsPolicy. In my case, I wanted to limit the number of thread because each holds some limited resources.
I'm not an expert on this stuff, so suggestions or criticisms are welcome.
Some background - Executor is a simple interface:
An object that executes submitted Runnable
tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads.
A ThreadPoolExecutor uses a pool of threads to execute the tasks. The trick is in configuring it. I wanted:- no threads when it's idle
- a limited number of threads when it's busy
- a bounded queue
You have to read the documentation for ThreadPoolExecutor very carefully. Forget about your preconceptions of how it "probably" works.
You can specify "core pool size" and "max pool size". Normally "core" threads do not time out. You also specify the type of queue (bounded, unbounded, or synchronous), and the rejection policy.
There are a couple of gotcha's.
- Even if activity is very low and one thread could handle it, the full number of core threads will be created.
- Even if activity is high and all the core threads are busy, additional threads will not be created until the queue is full.
Both of these are because it's very hard to know if a thread is idle.
They combine to lead to a common mistake - setting core size to 0 because you want all the threads to time out, and using an unbounded queue, which will never be full. Using a large queue is almost as bad because additional threads won't be created until that many requests are queued up - ouch!
Here's what I ended up with:
private static final int CORE_THREADS = 0; private static final int MAX_THREADS = 8; private static final int KEEP_ALIVE = 1; private static final ThreadPoolExecutor executor = new ThreadPoolExecutor(CORE_THREADS, MAX_THREADS, KEEP_ALIVE, TimeUnit.MINUTES, new SynchronousQueue<Runnable>(), threadFactory, new ThreadPoolExecutor.CallerRunsPolicy());A SynchronousQueue is a zero size queue, it hands off items directly. Because its remainingCapacity is always zero, it is always "full", so up to MAX_THREADS will be created as needed.
Without a queue and with a maximum number of threads, it's possible that tasks will need to be rejected. By specifying the CallerRunsPolicy rejected tasks will be executed by the calling thread. This is a simple feedback mechanism that will throttle requests since the caller won't be able to accept additional requests while it's busy executing a task.
I chose MAX_THREADS of 8 to be a bit bigger than the expected number of cpu cores (to allow for some threads being blocked). In our actual usage one or two threads seem to be sufficient.
There is a standard Executors.newCachedThreadPool but the documentation isn't specific about what its configuration is. From other sources it appears quite similar to my configuration - zero core threads and a SynchronousQueue. However, it has unlimited MAX_THREADS instead of CallerRunsPolicy. In my case, I wanted to limit the number of thread because each holds some limited resources.
I'm not an expert on this stuff, so suggestions or criticisms are welcome.
Thursday, March 03, 2011
Mockito
I've been reading Growing Object-Oriented Software, Guided by Tests. I write automated tests for my code, but I'm not especially good at it. Partly because I'm not very good at writing testable code. Which is partly because I don't usually code test first. My problem is that I know enough to know that I don't know enough!
Anyway, I decided I should try to clean up my act a little. I wanted to try using a mocking library. I did a few minutes of research and looked at a comparison between EasyMock and jMock. Then I ran across Mockito and it looked simple and easy to use.
I added the jar to my project and wrote my first test with it. No problems and very easy to use.
But it's never that easy. I ran my build, which uses Proguard, and got a ton of errors and warnings :-(
I probably could have fixed them, but I took the easy way out and just configured my build to exclude the tests. I only ever run them from Eclipse anyway. It's not the ideal solution but it'll do for now.
In the process I cleaned up my Ant build file. When I first created it I was even more ignorant about Ant and Java and Proguard than I am now.
I've revised a few more tests to use Mockito and I'm quite happy with it. If you're programming in Java and aren't using a mocking library, give it a try.
Anyway, I decided I should try to clean up my act a little. I wanted to try using a mocking library. I did a few minutes of research and looked at a comparison between EasyMock and jMock. Then I ran across Mockito and it looked simple and easy to use.
I added the jar to my project and wrote my first test with it. No problems and very easy to use.
But it's never that easy. I ran my build, which uses Proguard, and got a ton of errors and warnings :-(
I probably could have fixed them, but I took the easy way out and just configured my build to exclude the tests. I only ever run them from Eclipse anyway. It's not the ideal solution but it'll do for now.
In the process I cleaned up my Ant build file. When I first created it I was even more ignorant about Ant and Java and Proguard than I am now.
I've revised a few more tests to use Mockito and I'm quite happy with it. If you're programming in Java and aren't using a mocking library, give it a try.
Subscribe to:
Posts (Atom)