Tuesday, May 15, 2012

Immudb Concurrent Performance Puzzle

One of the tests I hadn't run on immudb (my new append-only database engine for Suneido) was a concurrency test I'd used to debug the previous database engine.

Here are some rough results. As usual, this is not a scientific benchmark. The operations it's doing aren't necessarily representative, and I'm not running it long enough or enough times to get accurate figures. But it still gives me some information.

I'm pretty happy with the results relative to the previous version. Up to four client threads it's about four times as fast - can't complain about that. And another good result that's not shown by this chart is that immudb has a lot less transaction conflicts.

Also on the positive side, I only found a couple of bugs, and none of them took more than a few minutes to fix. This is a sharp contrast to when I was debugging the previous version. I give credit to the immutability and the resulting reduction in locking.

But I'm puzzled by the drop in performance over 4 threads. And I didn't even include 8 threads because the results were all over the place - anywhere from 3000 to 6000. There's some variation with less threads, but nowhere near this much.

If the performance just levelled off, that would be one thing, but I'm not happy with the performance dropping under heavy load - that's not what you want to see.

The previous version may not perform as well, but it doesn't show the puzzling drop off with more threads. Although, even with the drop, the immudb performance is still much better than the previous version.

At first I was only running 1, 2, 4, and 8 threads. My computer has 4 cores with hyper-threading for more or less 8 hardware threads so I thought maybe it was because I was using up all the hardware threads and not leaving any for other things like garbage collection and the OS. But that wouldn't explain the drop off at 5 threads.

I looked at hprof output, but it didn't give me any clues. It did highlight some areas that I could probably improve, but not related to concurrency. (they were related to one of my pet peeves - ByteBuffer)

More threads could mean more memory usage, but JConsole and VisualVM don't show a lot of time in garbage collection.

The obvious issue is some kind of contention. Immudb doesn't do much locking (because most things are immutable) but I could see contention over the commit lock. But JConsole and VisualVM  don't show much waiting for locks. Thread dumps show all the threads as RUNNABLE And I don't see underutilization of the cpus, as I would expect with heavy contention (since they'd be waiting for locks, not executing).

Another possibility is that more concurrent transactions will mean more work to do read validation. And since each commit has to check against all other concurrent transactions, this is ON2 which could be bad. But again, I don't see any signs of it in my monitoring.

And I can't think of any explanation for the excessive variation. Strangely, 16 threads shows much less variation than 8.

I'm more puzzled than worried at this point. In actual usage I don't think it will be an issue because our systems, even the largest ones, won't be applying a continuous load this heavy.

Anyone have any ideas or suggestions of things to look at or try?

Saturday, May 05, 2012

No Service

Every trip I make to the US, I end up wishing I had cell phone internet access.

I have a Canadian data plan on my iPhone. I don't use it a lot, since I can usually find wifi, but occasionally it is very handy, e.g. to use Google Maps to find an address while wandering around. I could use this plan in the US but the roaming charges are ridiculous and I refuse to pay them.

When I got my iPad, I bought the 3G model thinking that at some point I could get a US SIM card and data plan. In the winter when I was in Savannah I stopped at an AT&T store to try to set it up. The two young guys working there were friendly enough, but did not seem at all inclined to help. The first told me I couldn't do it because I had a Canadian SIM card. I told them I didn't have a SIM card. The second guy then said it wouldn't work because I didn't have a SIM card. Couldn't they give me a SIM card? Yeah, they could, but it wouldn't work. Could we try it? No, because I wouldn't have an account. Could they set up an account for me? No, because I'm Canadian. At this point I gave up.

So this trip, I thought I'd try an Apple store and see if they would be more helpful. I explained what I wanted to the greeter. He said "no problem" and gave me an AT&T SIM card and even installed it for me (between greeting each person that came in the store). All I had to do was phone AT&T and set up a plan.

Of course, that wasn't as easy as it sounded. I knew phoning would be a hassle, so first I tried to sign up on the iPad itself. That wouldn't work because it insisted on a US billing address, so my credit card wouldn't work. Then I tried on the AT&T web site, but same problem. So I gave in and phoned. It took me multiple tries to get through the computerized phone system, wait on hold, get the wrong department and get transferred, wait on hold again. Finally I got someone in the correct department who informed me it was impossible. I had to have a credit card with a US billing address. Considering this was for a pre-paid plan, I'm not sure why they care where the money comes from. But it's not like you can argue logic with a call center.

I did a little searching on the web and the suggestion I found was to use a prepaid gift credit card. I've never used one of these but it was worth a try. I picked up a Visa card from a local convenience store. But the instructions with the card said if you wanted to use it on the web you should register it with your address. I needed a US address so I used our hotel address. Finally, I could set up my AT&T plan.

It all seemed to work properly, but I could barely get service. AT&T must not have very good coverage in our area of Las Vegas. When we got to Fresno I tried again. Good service - five bars - but still no data connection. I checked all the settings and checked my account but everything looked correct.

So we stopped at the Apple store on our way out of Fresno. I talked to the greeter, but he had no idea what the problem would be and said I'd need to make an appointment for the Genius Bar. Unfortunately, the next free appointment wasn't for two hours :-(  I wandered through the store thinking what to do next. I could just buy a US iPad (the new model would be nice), but I hated to do that when I didn't know what the problem was. There was a guy behind the Genius Bar who didn't appear to be busy so I asked if he had a couple of minutes to help me. He was a little reluctant (probably not supposed to take walk ups) but agreed to take a look. He checked all the same things I had already checked. Eventually he decided it was because it was a Canadian iPad. (It always strikes me as funny how these support people come up with theories, but then present them as "fact".) But it's a US SIM, I said. He told me to wait a minute and disappeared into the back with my iPad. He returned after 10 or 15 minutes and handed me my iPad saying "here you go". I looked at the screen and there was the little "3G" icon at the top! And I was connected to the internet!

I asked what the fix was. He hesitated and then said they turned it off and back on. Shelley was standing next to me and laughed, "isn't that what you would have told me to do?" I kicked myself for not trying this. The reason I didn't is that iOS tries hard to hide the whole idea of "turning off" your device. They try to abstract away the issue of on or off, but as usual, it's a "leaky" abstraction. Of course, the other question is why it needed to be turned off and on - what did that reset?

So finally, I have my US data. Has it been useful? Not very, since we're mostly out in the boonies and there is poor coverage. It should be more useful when I'm traveling in cities. In any case, it had become a quest and the usefulness wasn't really the motivation any more!

I only signed up for the $15 for 250mb prepaid plan since I wasn't sure how it would work. I'm not sure if that'll be sufficient. If I needed more I could get 3gb for $30. Also, in theory, I think I could swap the SIM card into my iPhone since that's a little easier to carry around and use on the street. Now I just have to remember to cancel my plan when I get home, although the most they could bill was the $50 of the prepaid card.


Thursday, March 29, 2012

More positive immudb results

I'm now at the point where I can successfully run our entire application test suite, not just the stdlib ones. Here's how much the database grows in each version:

cSuneido:  20 mb

immudb1:  400 mb   :-(

immudb2:  19 mb   :-)

I'm happy that my decision to back up and try a different approach turned out to be a good one.

It's not too surprising that immudb2 gives slightly better results than cSuneido for running tests, since it will often avoid persisting indexes for temporary tables, whereas cSuneido will always write them.

NOTE: This is still not a realistic workload. I suspect immudb will end up having more database growth than cSuneido. But as long as it's not excessive (as these results would tend to indicate) then it's ok.

Tuesday, March 27, 2012

Preliminary Immudb Results

I ran a few tests for database growth on the new version of my immudb append-only database storage engine for jSuneido.

First, I tested with persisting indexes every update transaction. This is more or less what it did previously. Then I tested with persisting every so many update transactions.

every 1 = growth 5,882,408
every 10 = growth 1,322,712
every 100 = growth 551,080
every 1000 = growth 354,168

This is the kind of results I was counting on. Deferring persisting the indexes results in much less growth.

Coincidentally, the improvement shown above is about 17 times, not far from the 20 times difference between cSuneido and the first version of immudb. This is just coincidence since I'm running different tests, but it's very nice to see that the improvement may be enough to bring immudb into the same ballpark as cSuneido, which is pretty good considering cSuneido is a mutable database.

Is persisting every 1000 update transactions reasonable? I think so. These tests only take about 5 seconds to run. Remember, the actual data is being written right away, log style, so persisting less often does not increase the risk of losing data. It only increases the amount of work to be done by crash recovery. Making crash recovery re-process 5 seconds (or more) of transactions seems fine. Especially considering that cSuneido rebuilds the entire indexes after a crash, which could take a long time for a big database.

These numbers are from running the stdlib (standard library) tests. This isn't a very typical work load, so the numbers aren't going to be very accurate. But I'm just looking for some very rough feedback at this point. If I'd seen little or no improvement, I'd be seriously depressed right now and considering taking up golf instead :-)

Ideally, I need a good statistical model of a typical work load, or at least representative, since there's probably no such thing as typical. Another project for my "spare" time!

Hmmm... thinking about what the stdlib tests do versus what a typical workload might be, I realized that one atypical thing the stdlib tests do is to create, use, and then drop temporary database tables. But in that case, if you defer persisting till after the table is dropped, then you never need to save the indexes for that table at all. (Although the data will still be written immediately, log style.)

It turned out I wasn't handling this, which actually was good because it would have seriously skewed the above results. It was easy enough to fix.

Rerunning the tests, I got the results I expected - much the same growth when persisting every transaction, less growth than before as the persist interval increased. Persisting every 1000 transactions resulted in a growth of only 62,424 - roughly 5 times better.

This is nice, but not really relevant to production, since a typical workload does not include creating and dropping a lot of temporary tables. It will be nice for development because running the tests won't cause the database to grow so much.  (At least until I implement background compaction.)

Persisting every so many update transactions is not necessarily the best approach, I'm using it because it's the easiest. Alternately, you could persist every so many seconds or minutes, or try to persist during "idle" times. Currently, I'm persisting synchronously i.e. while holding the same lock that commits use. So update transactions can't commit till the persist finishes. (But read-only transactions are not affected, and even update transactions can read and write, they just have to wait to commit.)  Depending on how much index information has to be persisted, this could introduce delays in server response. To alleviate this, you could persist in a background thread. Because of the immutability, this should be relatively straightforward. The exception is updating the in-memory index information to show it has been saved - this would require some synchronization.

Another issue I haven't dealt with is that I may need to "flush" some of the index information out of memory. Once it's been stored, a btree node in memory consists mainly of a ByteBuffer that points to part of the memory mapped index file. Since the memory mapped space is "virtual", it will get evicted if the OS runs short on memory. But for a really big database, even the small objects referencing the virtual memory may add up to too much memory usage.

It would be easy enough for the persist process to discard the btree nodes once they have been saved. But that would have an effect on performance since they would have to be recreated on-demand. You'd want to evict selectively, presumably on some kind of LRU or NFU basis, which would require tracking some kind of usage data. Presumably, you'd also only want to do this when necessary, i.e. when memory is getting full. Hmmm... perhaps one approach would be to use WeakReference's and let Java worry about when and what to evict. Anyway, that's a challenge for another day.

And now back to testing :-)

Sunday, March 25, 2012

jSuneido immudb milestone

The improved version of the jSuneido append-only storage engine (immudb) now passes all the standard library tests.

Which is great, except it just means I'm right back where I was two months ago with my previous jSuneido immudb milestone. As usual, it feels like I only started on this a couple of weeks ago.

Except that hopefully I've fixed the database growth and transaction conflict issues.

I haven't implemented the idea of passing the table select predicate down to the transaction so it can use it for read validation. But even without this, the conflict detection should be at least as fine grained as cSuneido (and much better than previous versions of jSuneido).

I'm almost to the point where my strangler fig approach has taken over the old immudb code and I should go back and clean it all up so I have a single version. I can't see needing the old immudb version again (and it'll be in version control if I do).

The last problem I ran into was a tricky one. In order to defer btree updates till commit, I use transaction local btrees. Which means to access an index you have to look at both the local and the global btrees. To do this I use an iterator that merges two other iterators. Iterating through two sorted sequences in parallel is a standard easy problem. But Suneido allows you to bidirectionally step forwards or backwards in any order with the same iterator. And it turns out that this makes iterating in parallel a lot tougher. The trick is when you change directions. The final code is not that large (look at next and prev in MergeIndexIter) but I went through about six different versions before I finally got one that appears to work. Of course, most of my previous versions also appeared to work, until I added more test cases.

A big part of finally figuring this out was coming up with the right visualization for it. Here's what I ended up with:


Notice it's paper and pencil. (It should have been on graph paper but I so seldom do stuff on paper anymore that I didn't have any handy.) I would have preferred to do it digital, but I couldn't think of an easy way to do it. I could have done it in a drawing program or even a spreadsheet or word processor, but it would have been a hassle. Drawing programs make it possible to do just about anything, but they'd don't tend to make it easy and natural (at least, if you only use them occasionally like me). Maybe there's a market niche for someone to go after. Or maybe there already is a good program for doing this - anyone know one?

Given the extra complexity, I started to question whether Suneido needed to allow you to switch directions with an iterator. I can only think of a couple of places where we use this. But I'll leave that question for another day.

And now back to testing. I imagine I've still got a few bugs lurking in there. I also need to do some testing and see whether I've actually solved the database growth problem. It may be harder to know whether the transaction issue is solved until we actually get it in production.

Friday, March 23, 2012

I'm a little slow sometimes

A roundabout detour of bit twiddling optimizations that turn out to be ultimately unnecessary.

Originally, Suneido used 32 bit database file offsets. Remember when disks (let alone files) were limited to 4 gb? When we needed to support databases larger than 4 gb, I didn't want to switch to 64 bit offsets because of the extra space required to store them, so I "compressed" larger offsets into 32 bits. (a bit like the Java -XX:+UseCompressedOops option)

The approach I used was to align and shift. If you align the data in the file on, for example, 8 byte boundaries, then the lower 3 bits of an offset will always be zero. So you can shift a 35 bit offset to the right 3 bits so it fits into 32 bits. (Who says bit twiddling is dead?) This allows a maximum file size of 32 gb instead of 4 gb. (Actually, for obscure historical reasons, even though the alignment is 8 bytes, the shift is only 2 bits, so the maximum size is 16 gb.)

The tradeoff is some wasted space. Each block of data will waste an average of 4 bytes.

The problem is, it's starting to look like we need to allow even larger databases. I could just increase the align/shift to 16 bytes/4 bits to allow 64 gb. But that would increase the average waste to 8 bytes, even though most databases aren't that big.

Ideally, you'd want small databases to use a small align/shift, and larger databases to use a larger align/shift. I could make the align/shift a command line option, but you'd have to rebuild the database to change it. And I'm not a big fan of cryptic options that most people won't understand.

My next thought was to use a variable align/shift. There are different ways you could do this, but a simple one would be to reserve two bits to specify the align/shift. This would leave 30 bits (1 gb range) for the unshifted offset. For example:

00 = 4 byte alignment = 0 to 4 gb
01 = 8 byte alignment = 4 to 12 gb
10 = 16 byte alignment = 12 to 28 gb
11 = 32 byte alignment = 28 to 50 gb

For databases up to 4 gb, this would use a smaller alignment than currently and therefore less wasted space, while still allow a much higher maximum size than now. Only large databases would pay the cost of higher wastage due to larger alignment.

This scheme would be slightly more complex to decode, but not too bad.

At first I was excited about this approach, but it doesn't scale. You can increase alignment to allow bigger files, but then you waste more and more space. At some point, you'd be further ahead to drop the alignment and just store 64 bit offsets. Except you don't want the overhead of large offsets on small databases.

So why not just use a variable length encoding like Google Protocol Buffer varint?

A variable length encoding would not be the simplest thing to manipulate in the code, but we don't have large numbers of offsets in memory, so you could use 64 bit unencoded offsets in memory, and just use the variable length encoding on disk.

At this point I whack myself on the forehead. D'oh! Sometimes I am such an idiot! (A lot of my "aha!" moments are spoiled by thinking that if I was just a little bit smarter and had thought of it a little sooner, I could have saved myself a lot of work.)

Suneido already uses a variable length encoding when storing file offsets!

The offsets are stored in the btree key records. They are getting encoded the same way other numeric values are packed into records, which is a already a variable length encoding.

So all the messing around with align/shift is mostly wasted effort. It does reduce the magnitude of the offsets, which does mean they are encoded slightly smaller, but this is minor.

I'd be further ahead to drop the align/shift altogether, use 64 bit offsets in memory, and let the variable length encoding handle making small offsets small. This would eliminate the whole maximum database file size issue!

If I want to save space, I could use a different variable length encoding for offsets. The current encoding is designed for Suneido's decimal floating point numbers and is not ideal for simple integers.

I can't use the Protocol Buffer varint encoding because I need encoded integers to sort in the same order as the original values. varint's don't because they are stored least significant first. One simple approach would be to reserve the first two bits to specify the length of the value, for example:

00 = 4 bytes
01 = 5 bytes
10 = 6 bytes
11 = 7 bytes

This would give a maximum of 54 bits (7 * 8 - 2) or about 16,000 terabytes. By the time anyone is trying to make a Suneido database bigger than that, it should be someone else's problem :-)

Of course, I'm itching to implement this, but I'm in the middle of debugging my immudb improvements so it'll have to wait.

Monday, March 05, 2012

Piracy?

I've been a fan of Pragmatic Programmers and their books for a long time. I buy their books directly, both to maximize their revenue, and to get eBooks without DRM.

Recently I noticed this at the bottom of one of my order emails. (I'm not sure how long they've been including it.)
Please don't put these eBooks on the 'net or on file-sharing networks. Sometimes folks do this accidentally by downloading into a shared directory.
I don't have any problem with the "don't post these eBooks on the 'net" part. But I really question the file-sharing part - not the legality of it, since I'm no expert on that, but the "morality" of it. Even the 'net part needs to be more specific - I put my books on the "net" - into Dropbox so I can access them from various devices and locations, but they're not public. Of course, maybe that's not "legal" either.

If I bought a paper copy of the book, and put it on the bookshelf in our company coffee room, I don't think I'd be breaking any laws. But I'm not supposed to do the same thing with a digital copy.

Potentially, multiple people could simultaneously read the digital copy. But I have trouble getting anyone to read any books. The chances of multiple people reading the same book at the same time is slim to nil.

Note: If my staff needs regular access to a book, I have no problem with buying extra copies.

I always thought Prag Prog were more "enlightened" than this. But obviously someone there couldn't resist the urge to "wag their finger" at us.

In some ways, Amazon, despite their DRM, is more flexible. I could buy Kindle books on a company account, and set up multiple Kindle readers (either devices or software) associated with that account. I don't know if Amazon would regard that as entirely legitimate, but they allow it. (Albeit, intended for "families".)

Publishers want to have their cake and eat it too - they want the benefits of digital - no printing, no inventory, no shipping, and at the same time, they want to take away all the traditional rights we had with paper books to share and lend. Some eBooks are now more expensive than the hardcover - what's the story with that? Who are the pirates now?

Sunday, March 04, 2012

jSuneido Immudb Progress

I've made good progress on the changes to the append-only database storage engine for jSuneido. The new version now passes all the built-in tests, although I have a few things left to do to improve performance.

I'm pretty happy with how it went. No big surprises and the code turned out quite simple for the most part. It's nice that there's less shared mutable state and therefore less potential locking issues.

Previously, updating the shared state and persisting to disk were a combined process. It was a little tricky teasing this apart so they were independent. For example, the semi-immutable data structures were flipping to immutable when they were stored. Now they have to be flipped to immutable before they go into the shared state, separately from being stored.

Of course, the reason I did all these changes was to try to address some "production scale" issues, and I still haven't got to the point where I can really tell if I've solved them or not. So far, the signs are good.

As usual, the code is in Mercurial on SourceForge

Related: Immudb Progress

Sunday, February 19, 2012

Immudb Progress

I've been plugging away at improvements to the immudb append-only storage engine for jSuneido.

To reduce database growth: (see Houston, we have a problem and Problem or Opportunity?)

- Change to persist btree index changes to disks periodically, instead of after every transaction.
- This makes redirection less important so I'm removing it. (Funny how this great idea is suddenly more trouble than it's worth.)
- This also allows persisting btrees in a separate concurrent thread. To make this as independent as possible I plan to split the database into two files. I have mixed feelings about this because I really like having everything in one file. With two files you have to make sure you always backup / restore / copy / move / rename the two files together because if they get out of sync you're in trouble.

To reduce transaction conflicts: (see Suneido Database Transaction Puzzle)

- Instead of tracking reads and writes by btree node, track writes by record and reads by index range and predicate.
- This makes it harder to use the previous approach of merging btree versions. Instead I'm "overlaying" transaction local indexes over the "master" indexes, and then applying the index changes to the master index during commit. (Easy because commits are single-threaded.)


None of these changes is that big, but unfortunately, they are fairly invasive. Rather than "break" everything while the changes are in progress I've been taking a "strangler fig" approach, keeping the existing code intact and functional while at the same time building up a new version that leans heavily on the old one. Once I have the new version functional on its own I can remove the vestiges of the old code.

I should probably be doing these two sets of changes separately, but they keep getting mixed together in my head. I just know where I want to get to, not necessarily which problem each particular change is aimed at. The changes all affect the same code, so in some respects it's easier to do them together. Time will tell whether this was the right approach :-)

Another question is whether the changes I'm working on will actually succeed in reducing database growth sufficiently. I'm actually fairly confident about this because it's not a "yes or no" question. If you persist the indexes after every commit you get the excessive growth I've already seen. At the opposite extreme, if you only persisted at shutdown, you'd get minimal growth. Presumably, the answer lies somewhere between these two extremes. Looking at it another way, if you updated the same btree node in each transaction, if you only saved it once per 10 transactions, then you'd get 1/10th the database growth. If 1/10th is still too big, then you only save every 100 transactions, giving 1/100th the database growth. Since the actual data is still being written to disk at each commit, the only cost of persisting the indexes less often is increased work by crash recovery, which (a) shouldn't happen that often, and (b) processing e.g. 100 commits worth of data to recover from a crash doesn't seem that bad.

Another nice feature of deferring persisting indexes is that it offloads work from commits which are a potential bottleneck since they're single-threaded. There are some issues with persisting indexes concurrently with commits, but I think I have an approach figured out. On the other hand, I'm now updating the master indexes during the commit so it's not all gain.

So far so good. I have some of the pieces working and it's gradually coming together. I'm anxious to see how it turns out.

Wednesday, January 25, 2012

On-line Database Compaction

One of the weaknesses (at least in my mind) of Suneido's database is that it requires off-line compaction. i.e you have to periodically shut down the server and run a standalone compaction process.

Practically speaking, that hasn't been a big issue. With the mutable storage engine, database growth is relatively slow and you don't need to compact that often. Few of our customers run 24 x 7 so it's not a problem to shut down at night occasionally. Also, some of the updates we deploy have to be run single user anyway.

But with the immutable storage engine, even with my improved design, the database will grow faster, and require more frequent compaction.

It would sure be nice if you could compact on-line, in the background, while the server was running.

Thinking about my planned improvements, I realized this might now be relatively easy. (Actually, I think I could have used a similar approach before, it just became more obvious now.)

The basic idea is to use a read-only transaction, and its snapshot of the data base as of a certain point in time, to compact the database up to that point. (Just like the current off-line compaction.)

But since the database is active, you will probably have activity after that point. So then you reprocess any updates after your transaction and apply them to the new database. (This is similar to how crash recovery would reprocess updates that happened after the last good checkpoint.) When you "catch up", then you switch over to the new database (with some synchronization required).

There is a potential problem if the server is so busy that the compaction never catches up. In practice, the compaction could be scheduled at a slow time of the day, and most of our systems aren't continuously busy. In any case, this wouldn't "hurt" anything, other than performance. You could easily detect this situation and abort the compaction.

Although the server can continue to operate during the compaction, performance may be affected. (As it can be with our current on-line backups.) CPU time is usually not the problem these days. A bigger issue is that reading all the data can cause other working set data to be evicted from memory. One option may be to not use virtual memory to read the data. Then you're only competing for disk cache and not virtual memory space. What we've found with our on-line backups is that as long as you have lots of memory it's not a problem.

One benefit of building a new compacted database file (rather than compact in place) is that it is inherently crash proof, at least in the sense that if the server crashes, the current database won't be affected. The new database being created can simply be discarded.

In garbage collector terms this is equivalent to a "copying collector" i.e. it just copies the live data, rather than scanning all the data (live and dead) as, for example, a "mark-sweep" collector does.

Suneido does support on-line backups, taking advantage of snapshot isolation to make a consistent backup as of a certain point in time. With the append only immudb storage engine, another option would be to simply copy the bytes of the database file up to a certain point without paying any attention to interpreting the structure. If the data and indexes are in the same file, this would copy both. If they were in separate files you'd have the option of only copying the data (like the current on-line backup). However, this would still be somewhat larger than the current backup because it would include deleted records. On the other hand, it should be faster since it's just a bulk copy.

Note: These ideas, like my ideas for limiting database growth, are just ideas. I'm sure there are unseen complexities waiting to trip me up.

Saturday, January 21, 2012

Problem or Opportunity?

I started to get a little worried about the database growth problem with my immudb append-only storage engine for jSuneido. I could think of ways to reduce the growth a little bit, but not by much, and only at the cost of adding a bunch of complexity.

Was I going to have to throw out a year's work, when it seemed so promising? Having to throw out a few day's work is one thing, but a year? How could I have gone so far without seeing this problem? Especially since I knew perfectly well it was an issue with this approach.

I'm as much an eternal optimist as the next programmer, with a big enough ego to think I can solve most problems. But I'm also human with the standard intermittent feelings of inadequacy and stupidity. In other words, I was starting to worry. But I knew enough to give myself time to mull it over.

I was sitting eating lunch, reading a computer book when I started to think about the problem.

What if I wrote out the redirections as just a list of values? That would be a lot more compact than saving the changes to the hash trie. That would be fine while you were running because you'd still have the hash trie in memory. But what about restarting? How would you get the in-memory hash trie back? And how would you handle crash recovery?

It seemed like the problem wasn't so much the way I was saving the information, it was saving it so often, for every transaction. I did this so the on-disk data structure matched the in-memory data structure after every transaction commit.

But was that strictly necessary? What if you relaxed that constraint? What if you only wrote out all the extra information (btree nodes, redirections, db info) periodically? That would reduce the database growth a lot. You could then consider a continuum, from saving after every commit, to the opposite extreme of only saving when you shut down the server.

I realized that what I was thinking about was basically the idea of "checkpoints" - a standard database implementation technique. You could also think of it as simply deferring index writes to disk.

Crash recovery becomes a little harder - you'd have to start from the last good checkpoint and reprocess the data from transactions after that. Similar to how other database crash recovery reprocesses logs from the last checkpoint. This isn't quite as elegant or simple as being able to simply pick up from the last good commit, but crashes don't happen that often. And it was still a lot better than having to rebuild all the indexes after a crash as Suneido requires with the current mutable storage engine.

The beauty of this solution is that it doesn't require a lot of changes. The immutable persistent in-memory data structures and the code that uses them stays exactly the same. You just don't save it as often.

Of course, taanstaafl. The cost of this approach would be higher memory usage. Once the index changes are saved to disk you don't have to keep the in-memory versions. Obviously, if you don't save to disk, you do have to keep the in-memory versions. However, the memory cost is much less than the disk growth because the memory will be garbage collected. Memory is cheap and jSuneido assumes you have lots. (For small systems with limited memory, cSuneido is a better choice.)

Another benefit from saving after multiple transactions is that there can be considerable overlap. For example, if there are "hotspots" in the data that are frequently updated, you only save them once per checkpoint instead of once per commit.

So the commits would simply save the data records (inserts, updates, and deletes) to the database file, with no other information. The btrees would be updated in-memory only. Then periodically you would save a checkpoint with the btree and db info changes.

Thinking about it, I wonder if I needed the redirections with this approach. The point of the redirections is to save space by not path copying to the root. But periodic checkpoints may save enough space that redirections aren't as important. Eliminating them would definitely simplify the code - always a good thing!

I also realized that I could even write the checkpoints in a separate concurrent thread since they could use the persistent immutable data structures to view the state as of a certain point in time (just like transactions do) without any concurrency issues and without blocking ongoing database activity. The only trick would be coordinating writing to the database file. Even that coordination could be avoided if you wrote the checkpoints (i.e. the index btrees) to a separate file from the data. Although I like having everything in one file, it's certainly not essential.

They say that creative problem solving is a lot about combining known, existing pieces into new combinations. I had all the pieces I needed to solve this problem, I just had to put them together. (Ironically, I talked about a lot of the same pieces almost two years ago in A Faster, Cleaner Database Design for Suneido)

And so I go from thoughts of a depressing major failure, to feeling like a kid with a new Lego kit :-)

Thursday, January 19, 2012

Houston, we have a problem

I thought things were going well with my new immudb append-only storage engine. It finally passed all our application tests, and performance was looking good.

Then I noticed the amount of database growth when I ran our full application test suite:

mutable storage engine = 20mb
immutable storage engine = 400mb

Ouch. I always expected it to use more space - that's the cost of immutable persistent data structures. But I didn't expect it to be 20 times as much.

With immutable persistent data structures in memory, the space is reclaimed by garbage collection. But so far the only form of garbage collection for database space that Suneido has had is offline compaction.

It's a little surprising that the performance is good, even on hard disk rather than SSD. I assume that's because sequential writes are actually quite fast. And also because it's using memory mapped file access so it's not blocking.

Given that the performance is ok, the database growth might even be acceptable. It might just mean compacting the database nightly instead of weekly as we currently do. However, presumably the performance would be even better if it wasn't writing so much.

I think it's worth spending some time to see if I can improve it. The first step was to see what the breakdown of the space was. (Note: these figures are growth, not total size.)

96820 update transactions
data space 11 mb
btree node space 140 mb
redirection space 89 mb
db info space 148 mb

If you look at the growth per transaction it's only about 4 kb. That doesn't sound too bad, until you have 100,000 transactions.

The btree node space could likely be decreased a lot (maybe half?) by compression of the nodes, perhaps something as simple as prefix compression.

Redirection is used to avoid path copying all the way to the root in the btrees. It's stored as an immutable persistent hash trie. Unfortunately, it does have to do path copying itself and that may be part of the problem.

I also noticed that each subsequent run used more redirection space. That's a little curious because it's doing the same set of updates each time. However, as the total number of redirections grows, then path copying will grow.

The "db info" is another hash trie containing aggregate data about each table and index. The information is used for query optimization. It could use redirection instead of path copying but currently it doesn't because it's using the same hash trie implementation as the redirections. It currently stores a "blob" of information for each table.

One improvement would be to delete redirection and db info when a table or index is destroyed. However, this would help the tests more than actual application usage, since the tests create and destroy lots of temporary tables. This might mean actual application usage would not show as high database growth as the tests.

I'm not sure what to try first. It's too bad the space is coming from three similarly sized sources. If the majority had come from one source then that would be the obvious place to start. There's always a chance that there are bugs or design flaws causing part of the problem. Another variable to try adjusting is btree node size.

* You could argue that our application test suite ideally shouldn't be hitting the database at all. But that's not relevant to this post.

** The title of this post comes from Apollo 13

Thursday, January 12, 2012

jSuneido immudb milestone

As of today, my new append-only storage engine for jSuneido (immudb) successfully runs all our application tests. Of course, there are probably still bugs, but it's a big milestone.

And it runs the standard library portion of the tests roughly twice as fast as with the previous storage engine. It also loads databases from a dump roughly twice as fast. And crash recovery is hugely faster.

Overall, I'm pleased with how it's worked out, other than the transactions issues I posted about recently.

As usual, I was surprised when I looked at how long it's been since I started on this - almost exactly a year! Of course, I wasn't working on it full time, and I had to do quite a lot of refactoring on jSuneido to allow pluggable storage engines, but still a lot of work.

Next, I'm planning to work on the transaction issues. Hopefully that won't take a year!

See also: all the posts related to immudb

Sunday, January 08, 2012

Suneido Database Transaction Puzzle

When we converted the first large customer to jSuneido we ran into a lot of transaction conflicts. A lot more than we had with cSuneido. The question was why? (Note: jSuneido is working well, without many conflicts, both in-house and at a smaller client - this problem only showed up under heavy use.)

[Disclaimer - this post is largely to clarify and review my own thinking. My apologies if it's somewhat cryptic.]

A lot of the conflicts could be traced back to application issues that we could fix e.g. having an alert inside a transaction (so the transaction was kept open too long) or not retrying.

But there were still too many conflicts. I ended up adding an option for snapshot isolation rather than serializable snapshot isolation. That eliminated most of the conflicts, but it opens up the potential for inconsistencies and I'm not really happy with it.

My initial assumption was that jSuneido was more "strict" than cSuneido and that was why there were more conflicts. jSuneido implements the algorithm from Serializable Isolation for Snapshot Databases by Cahill, Rohm, and Fekete [CRF], whereas cSuneido does "read validation" i.e. when a transaction commits it verifies that all of its reads are still valid.

But when I thought about the two approaches, I realized that cSuneido's read validation is actually more restrictive than jSuneido's CRF algorithm. Read validation requires transactions to serialize as of their commit time, whereas CRF allows re-ordering (as long as there are no dependency cycles that would prevent it). Otherwise, they are basically doing the same thing.

So I was back to trying to figure out why jSuneido was getting more conflicts.

The CRF approach does admit some "false positives" i.e. spurious conflicts that are not really conflicts. Because it's an "incremental" approach, the conflict flags can get set by transactions that don't end up overlapping or that later abort. As the rate of conflicts rises, this effect will get worse. It's possible this accounts for some of the jSuneido conflicts but I don't think it's the main reason. (Note: cSuneido's read validation only looks at committed transactions so it doesn't degrade in this fashion.)

One major difference is that jSuneido tracks reads and writes by B-tree index node rather than by individual record. This handles phantoms and simplifies merging transaction updates, but it can lead to more conflicts. For example, if a table only has only a few rows, then all updates will be in the the same B-tree node and overlapping transactions will always conflict. Similarly, appending keys at the end of an index is likely to conflict due to updating the same node.

cSuneido handles phantoms by tracking the index ranges that a transaction reads. Read validation then checks whether any concurrent writes from other transactions fell within those ranges.

My current thinking is that tracking reads and writes by B-tree node is what accounts for jSuneido getting more conflicts, although I don't have an easy way to "prove" this. If I'm correct, then the solution is to track index ranges rather than B-tree nodes. But, that's not so easy with CRF since it uses read locks on items - harder to do with ranges. And you have to deal with concurrent changes to B-tree nodes, which would require a complex tree merge.

So I think I'll probably go back to read validation instead of the CRF approach. It does mean more work during commits, and since commits are single-threaded, that will reduce the concurrency of commits. On the other hand, CRF required access to shared locks which reduces the concurrency of reads and writes. And if commits did turn out to be a bottle-neck, I think it would be relatively easy to multi-thread the commit process e.g. using one task per index. (Since indexes are independent, this should parallelize well.)

But it still leaves the question of how to handle concurrent B-tree updates if I'm no longer "locking" nodes. One approach I've been considering is to delay updating the actual "master" B-tree indexes until a transaction commits. Then the updates are single-threaded and there are no concurrent updates to deal with. Again, this puts more work into the commit, but I think this can be addressed as above.

(cSuneido handles this issue by keeping multiple versions of keys in the B-tree nodes.  But this is tricky and doesn't fit well with the append-only approach that I'm concentrating on.)

However, transactions still need to "see" their own updates, so you'd have to have some kind of additional per-transaction indexes that you would transparently "merge" with the master ones. These could be B-tree indexes, but since they are in-memory, there are probably better data structures. One possibility is a "merge tree". I called this a "fractal tree" in a couple of blog posts (here and here) but the simple structure I'm talking about is not really the same as Tokutek's fractal tree. Since those blog posts I wrote an improved version (MergeTree.java) which I'm now using for temporary indexes required by queries.

A merge tree will be fast for inserts (faster than a B-tree) but slower for lookups. Accumulating updates in a per-transaction merge tree and then applying them to the master B-tree during commit will also have the side benefit of applying the updates to the B-tree in sorted order, which is optimal for B-trees.

Thinking about this, I have an idea for another potentially large optimization. Serializability really only cares about the records that the application sees. This is not the same as the records that the low level query execution reads. For example, if you have an index on fields a,b,c and you query on b, the low level code will scan the entire index, perhaps only returning a few records (possibly none). If you just track the index range that was read by the low level code, then it will appear that all the records were read. This will mean a much greater chance of conflicts than if you tracked the records that the application actually received.

However, I don't want to mix levels and have the query execution code worry about transaction read tracking. I think what I can do is have the query code pass a "black box" predicate to the low level database code. Then the transaction read validation can use the predicate when validating reads. i.e. writes by other transactions that do not match the predicate cannot have been seen and can be ignored.

One remaining issue is "bulk" updates since none of these approaches are great for updating a huge number of records. It may be reasonable to allow the application code to "lock" a table. This would prevent concurrent updates, and would allow updating the master B-trees directly, without accumulating updates in memory.

The problem with all of this is that it relies on an embarrassing amount of guess work and seat of the pants intuition. (Based on a fair bit of experience, but even experienced intuition is probably wrong as often as it's right.) Ideally, you'd test out these ideas before jumping into implementing them. But it's just not feasible. Any one of these issues would make a good research project. And even then, you'd only have analyzed a single issue - not how it relates to all the other issues.

The solution space is just too large. Which, of course, is also what makes it fun. It would be pretty boring if it was simple enough to analyze exhaustively, and have no questions about the best approach. The chances of me stumbling on a solution that is better than what is out there is slim, but at least it exists.

The timing actually isn't too bad. I'm just finishing up debugging the append-only storage engine. (So far, the performance looks promising, perhaps as much as twice as fast.) Once I finish that I can take a look at these transaction ideas and see what I can do.

Monday, January 02, 2012

Installing the Energy Detective

It's been an adventure installing the TED 5000 home energy monitor that I got to complement my new solar PV system. (See my Sustainable Adventure blog posts: Solar Powered and Rock Paper Sun.) I'm putting this post on my computer blog because it's more on the technical side. Sorry, it's a long post, I won't be offended if you don't read it. If nothing else, I'm hoping it might help other people with their installations.

I was nervous about installing it in the first place, because I'm not really comfortable messing around in the house breaker box. It didn't help that my last encounter with the house electrical system was when I tried to drill a hole through the back of a closet to run a network cable and managed to hit the electrical wiring inside the wall and short it out. (Annoying Shelley since it left her office without power.) But it's supposed to be straightforward, and I'm supposed to be a technical person. (If I'd been smarter I would have had the electrician install it when they installed the solar panels.)

You have to shut off the main breaker to the house, which means no lights. A headlamp is very handy.

I had to install two MTU's (Measuring and Transmission Units) each with two CT (Current Transformer)  sensors that clamp over the power wires, plus a Gateway unit that connects to your network.

My breaker box has two sections - an upper part with the main breaker, and a lower part with all the house breakers. I assume the idea is that you shut off the main breaker and then you can work in the lower section with no danger. Except I couldn't fit the main CT clamps in the lower section - the only place I could fit them was in the upper section. (where the power is still live on the external side of the main breaker)

I initially only installed the main (usage) MTU because the solar instructions showed two different configurations and I wasn't sure which one I had, which meant I wasn't sure which wires to put the solar CT clamps on, or even if I was supposed to install it in the house breaker box, or outside in the solar breaker box.

Next I had to find an empty power outlet (ideally not too distant from the breaker box) for the Gateway unit. It also had to be close enough to connect to my router. I used the same outlet that my solar panel monitoring unit was plugged into. Both it and the TED gateway use power line communications. (Filling up this outlet didn't make me popular because it was the one that Shelley had been using to plug in chargers for gadgets.)

Of course, my Apple Time Capsule router did not have any more free ports. (I used up the last one for the solar system.) So I had to run out and pick up a switch to get more ports.

Thankfully, I didn't have any trouble with the power line communications, as many people seem to. It probably helps that the office is at the same end of the house as the breaker box. I also remembered reading Jon Udell's blog post about installing a TED. The problem is that some of the TED instructions say to connect the black and the red wires, and some of the instructions say to connect just the black wire. (Not the only issue with their instructions.) Apparently, you should only connect the black one. In which case, why do they have the red one? Are there circumstances when you'd want to connect the red one?

The other thing the printed instructions don't mention is that if you only connect the black wire, then you have to specify that in the configuration. i.e. the default in the configuration doesn't match the instructions, sigh. This issue is fairly obvious because the voltage reads 60 instead of 120 V. Still, it took me a while to discover the setting. At first I assumed I had installed something wrong.

The final step is to connect to the system from your computer. This was a little confusing. It seemed to be a browser based interface, but the instructions also told you to download and run their "installer". I'm always reluctant to install software like this unless I really have to, especially since a lot of the time it's totally unnecessary (like the software that comes with your camera). The instructions said to connect to http://ted5000 with your browser but that didn't work. So I gave in and ran their "installer", which turned out to NOT be an installer at all - it's just a utility that locates the IP address that your Gateway is on.

After all that, it seemed to be functional!

At first, I didn't think it was that critical to monitor the solar system, since it had it's own monitor. But then I realized that I still wasn't really measuring the household energy usage. Instead, I was measuring the "net" i.e. the usage minus the solar generation. And to make it worse, with the default settings for the main MTU, the system assumes that usage will always be positive. But with the solar system, that's plain wrong - when the sun's shining, my usage should be negative, but it was still showing as positive. The reason they do this (absolute value) is so it still works when people install the CT's backwards. (Of course, the instructions don't tell you any of this. I gleaned it from their forum.)

I eventually figured out that I had the first configuration in their solar instructions. I shut off the power and opened up the breaker box again. Eventually I managed to find the right wires in the rat's nest, get the clamps on them, and squeeze them into the box enough that I could get the cover back on. (If it was a suitcase, I'd be sitting on it to get it closed!)

In this configuration my main MTU is configured as "Adjusted Load". What they don't make clear is that this means it handles negative values, which means that suddenly it's critical to have the CT's the right way around. Of course, I had mine backwards. And of course, you don't discover this till you close everything back up and turn the power back on. The instructions tell you to leave the cover off till you verify the system is working, but it's such a tight squeeze getting my cover on, that I feel better doing it with the power off.

Part of the blame for getting them backwards lies in the instructions. The printed instructions that come in the box say to orient the red dots "towards the source of the power". As far as I can tell from my install, that's just plain wrong. The Installation Pictorial on the web site is better - it shows the red dots oriented the opposite way. But the text says to orient the red dots "towards the breaker". That's correct for their picture, but if you ended up putting the clamps on the other side of the main breaker (as I would have if there had been room), then the dots would not be facing the main breaker and the instructions would again be wrong. The video shows the dots the wrong way (AFAIK) and the voice over simply says "make sure you get them the right way", not very helpful.

So I had to reverse the main CT's, which meant opening both sections of the breaker box (you can't take the cover off the top part without first taking the cover off the bottom part) It's easy to put the clamps on the wires, but the main power wires are really heavy and stiff. Although there's a fair bit of room in the top section, it isn't easy to get the clamps turned so the cover will fit back on.

I'm probably overly paranoid, but I did most of this work with rubber soled shoes, gloves, and one hand in my pocket. I'm a belt and suspenders kind of a person, if you hadn't figured that out by now!

It's pretty obvious that getting the CT's backwards has been a common problem - not surprising given their instructions. You'd think by now they would at least have corrected and improved their instructions. They attempted to handle the issue by having the software ignore the polarity in the default configuration. But that just makes it worse in a more complex setup like mine. Why not just add a configuration option to reverse the polarity? I can't see why that would be difficult software-wise. And for customers it would sure be a lot simpler than opening everything back up to physically reverse them. Rather than ignore polarity, the default setup could simply detect the polarity and adjust the setting automatically. Then you wouldn't need confusing instructions about red dots at all.

Finally, I think it's all working properly! Usage shows positive, solar generation shows negative, and the net is either positive or negative depending on whether I'm generating more than I'm using.

In the end I think I turned off the power and opened up the breaker box four times. And every time I turned the power off I had to reset all the clocks in the house.

I didn't bother buying the TED display unit since I knew there were iPhone apps. I've installed the free TED-O-Meter iPhone app which is basic but functional. I bought iTED ($.99) because it had a graph, but the graph doesn't handle negative kW values. I may try some of the others.

One thing I really like about the solar panel system is that I can monitor it from anywhere through the internet. That's because the solar system sends the data to an outside server. The TED system doesn't work that way, instead the data is collected on the Gateway device, which runs a web server so you can access it. That means I can't access it from outside my local network. (Unless I want to punch holes in my firewall and deal with my dynamic IP etc.) It also means all my data is on the Gateway so I'm probably going to lose it at some point e.g. if I update the firmware.

To get around this, the TED system can post the data to an external service. (commonly Google PowerMeter, until they discontinued it.) Unfortunately, this didn't go smoothly. Every time I tried to activate it, it would time out connecting to the server. I tried several different services so I don't think it was a server problem. Eventually I got it to work, but unfortunately I wasn't scientific enough to isolate what the issue was. I tried various combinations of: switching browsers, trying different services, resetting the gateway, entering the url with and without "http://", and turning the OS X firewall on and off. Presumably, it's the gateway sending the data, so my computer and its firewall shouldn't be relevant, but it's possible they have client side Javascript running on the browser that tests the connection during activation.

It's hard to tell which of these third party services is the "best". TED will only post to a single service so I can only try one at a time. I succeeded first with PlotWatt so I'll give them a try. I had emailed them about my connection problem and they responded quickly and helpfully so that's a good sign. The way their system works, they have to collect a certain amount of data before it's functional so I'll have to be patient. Of course, my initial data is a mess because half of it has the wrong polarity!

It would be nice if there were standards for all these home monitoring and automation systems. As it is every manufacturer and every type of system seems to do things differently and none of them work together.

Overall, the TED 5000 systems seems ok. I definitely think they could improve their instructions and documentation. And I think they could simplify installation if they added a polarity option to the software. Personally, I'd prefer if they sent the data to an internet service, but I can see where being able to operate independently could be valuable (e.g. if you don't have an internet connection). I'm glad they use a browser based interface rather than proprietary client software. The user interface is adequate. And it's great that their are third party services and apps.

Overview of my rat's nest.

The main (usage) CT's

The solar CT's in the bottom of the box and the two MTU's mounted outside

Part of the TED display, showing usage of 178w, solar generation of 426w (cloudy), 
with the net result of -248w, negative meaning I'm sending excess power to the grid.

Saturday, December 24, 2011

Approval Tests

I recently listened to a Herding Code podcast on Approval Tests.

The basic idea is that instead of writing:

    assert(x == <some-value>)

instead you write:

    approve(x)

and when you run the test it shows you the result and asks you to "approve" it. Once you've approved a value, the test will pass (without interaction) as long as the result doesn't change.

If the value is something simple like a number, there's not much point. But if the value is a more complex object like an invoice, then it's much simpler to approve the whole thing than to write a lot of asserts.

I can't see it replacing conventional style asserts, but it seems like it could be useful.


Friday, December 16, 2011

Textastic

I was looking for some way to have my Suneido source code on my iPad. More for viewing than to actually write code. I tried a few things, and settled on Textastic.

I was able to retrieve my cSuneido code from the SourceForge Subversion repository, but I didn't manage to get the jSuneido code from Mercurial. (It may just have been really slow. Or maybe it was trying to download the history?) So I just copied my source directory to Dropbox and pulled it from there.

I used it a few times while I was traveling recently and it worked well. For example, I'd get an email from my staff asking what a particular error message meant and I was able to find it in the source code and see where it came from.

Textastic is somewhat compatible with TextMate. I have a copy of TextMate on my Mac but I don't actually use it much since I tend to work in Eclipse.

From my limited use, I'd recommend it. It's not free, but for $10 you can't go too far wrong.

Wednesday, October 19, 2011

Long Polling with Suneido

It started with our summer student deciding to build an "instant messenger" in Suneido. I'm not sure what prompted this, but it seemed like an interesting challenge so I let him run with it.

The trick is the "instant" part. His initial approach was to store the messages in a database table, and poll the table for changes. To make it "instant" he wanted to poll very frequently, at least once a second, and preferably more. We have about 50 users, if they are all polling every second, that's an extra 50 queries per second. I wasn't too keen on this.

I suggested we set up a separate HttpServer for the messenger, and keep the "outstanding" messages in memory, so polling didn't have to hit the database. (New messages were still written to a history table in the database, but that's not as frequent as the polling.) This took the load off the main database server, and made the requests faster.

I explained that a better way to make it "instant" was to use Comet style long polling where the server simply doesn't respond to the request until something is available. This means a lot more connections to the server since every client always has an outstanding request, which can be an issue depending on the server. But we were using jSuneido for the messenger HTTP server, and 50 connections wasn't a problem.

But the client side is also an issue. If we're going to be waiting a potentially long time for a response, the client can't be waiting "blocked", we need to make the request asynchronously. Suneido doesn't have a lot of support for aynchronous programming, but it does have a Thread function that allows you to execute code in a separate "fiber" - not a real thread, more like a coroutine. (On jSuneido "Thread" uses a real thread.)

We've never used Thread much because it has always seemed a little flaky. But it is basically sound because it's what the Suneido database server uses to handle multiple users - and that is solid. And it's also how SocketServer is implemented, and that also works well (e.g. in HttpServer)

One issue is that you want to avoid updating the GUI from a thread. That is easy to do by using Delayed with a 0 delay to process the response. Delayed works via the Windows message queue, which is processed by the main fiber. The code running inside the thread looks like:

forever
     {
     if "" isnt response = Http.Get(uri)
          Delayed(0, ResponseProcessor(response))
     }

(plus catching exceptions etc.)

However, we haven't eliminated polling, we've just moved it to the messenger HTTP server, since it must poll for new messages. (Since Suneido doesn't have any way to "wait" for something.) The code looks something like:

400.Times
     {
     if Suneido.MessagesAvailable
          return ...
     Sleep(100)
     }
return ""

400 times 100ms is 40 seconds, less than the default timeout of 60 seconds.  Checking once every 100 ms (1/10 of a second) is a light load on the server, most of the time it's sleeping. (Note: this wouldn't work well on a cSuneido HTTP server because Sleep would block all the other fibers.)

This has ended up working remarkably well - the messenger is "instant" with little load on the server or client.

Tuesday, October 18, 2011

JUnit Max

I've been using JUnit Max for the last few days. It's an Eclipse extension that automatically runs your tests every time you save a change.

Normally I'd stick to open source options (which this is not), but I figure Kent Beck deserves some support.

It's taking me a little getting used to - makes me nervous not running the tests manually. It's designed to be unobtrusive, but part of me wants more blatant feedback that my tests passed.

One of my motivations for using it is to make sure I'm running all the tests, not just the one I'm working on. I've been caught a few times where I didn't discover I broke something elsewhere till some time afterwards.

Even with a fast machine, it still takes a while for the tests to run (although I'm the only one to blame for the speed of the tests), but at least they run in the background and I can continue working.

The web site is pretty minimal and there is no documentation as far as I can tell, but then again, it's pretty self explanatory.

One of the things that prompted me to try it was listening to a podcast with Kent Beck on Software Engineering Radio (recommended).

JUnit Max has had a bit of a rocky history, pulled in 2009, relaunched in 2010. I think it's a worthwhile product and I hope it's successful.