Thursday, December 03, 2009

More jSuneido Slogging

Another long tedious day working on getting more of our application tests to run on jSuneido.

These are large scale functional tests, so when one fails it's not easy to figure out why. It could take 5 minutes, it could take 5 hours.

I've found a couple of small bugs in jSuneido. One was because BigDecimal differentiates 1 from 1.0 from 1.00, which makes sense from a scientific precision viewpoint, but not when you're dealing with money. And the problem was actually even more obscure - it was because it differentiates 0 from .0 from .00

But the rest of the bugs (the majority) have been in our application code, either in the tests or in the code itself. Nothing serious, most of them were inadvertent dependencies on the order of unordered things.

But it's frustrating. It would be tedious enough doing all this testing to find bugs in jSuneido. But when I'm doing it to find other people's bugs it's annoying. And of course, as with any large body of code, a lot of it is confusing, hard to understand, and could be improved. (Don't get me wrong, I tend to think the same about my own code.)

Oh well, it's got to be done. Hopefully it doesn't take me too much longer.

Wednesday, December 02, 2009

Systems that Never Stop

An interesting (and entertaining) talk by Joe Armstrong (the principal inventor of Erlang) about writing fault tolerant systems. Well worth watching.

InfoQ: Systems that Never Stop (and Erlang)

Tuesday, December 01, 2009

A Debuggers Life

Another day of debugging, although with a twist - I found as many bugs in our application code as I did in jSuneido. Just minor stuff - there's nothing like multiple implementations of a language to flush out the edge cases.

It seems like a slow process, but the jSuneido bugs do seem to be getting smaller and more obscure, which gives me a certain amount of confidence that the main stuff is ok. Most stuff just works, which is a vast improvement over not long ago.

Monday, November 30, 2009

Software That Fixes Itself

Technology Review: Software That Fixes Itself

Cool but a little scary - will the software start to evolve?

Tuesday, November 24, 2009

jSuneido Socket Server

Up till now I've been using Ron Hitchens NIO socket server framework. It has worked pretty well, but it's primarily sample code. As far as I know it's not in production anywhere and not really maintained.

The first problem I ran into with it was that it didn't use gathering writes so it was susceptible to Nagle problems. I got around that with setTcpNoDelay, although that's not the ideal solution.

Another problem I ran into was that the input buffer was a fixed size. And worse, it would hang up in an infinite loop if it overflowed. To get around this I made the buffer big, but again, not an ideal solution.

And lastly, everything sent or received had to be copied in or out of buffers maintained by the framework. (rather than used directly)

So I decided to bite the bullet and write my own. It took me about half a day to write. It's roughly 180 lines of code. It's not as flexible as Ron's but it does what I need - gathering writes, unlimited input buffering, and the ability to use the buffers directly without copying. It's fairly easy to use - there's a simple echo server example at the end of the code. I wouldn't want to have to write it with just the Sun Java docs to go by, but with the examples in Ron's book, Java NIO, it's not too bad.

Of course, there may still be bugs in it, but it seems to work well so far.

Thursday, November 19, 2009

jSuneido Back on Track

After my last post I spent a full day chasing my bug with very little progress. Around 7pm, just as I was winding down for the day I found a small clue. It didn't seem like much, but it was nice to end the day on any sort of positive note.

This morning, using the clue, I was able to find the problem. It didn't turn out to be a low level synchronization issue, it was a higher level logical error, although still related to concurrency. That explained the consistency in the error. I had missed one type of transaction conflict, and that meant under certain circumstances one transaction would overwrite another. The fix was easy (two lines of code) once I figured it out.

Even with the clue, it wasn't exactly easy to track down. I ended up digging through a 100,000 line log file. Luckily I wasn't just looking through it, I was searching for particular things. It was a matter of finding the particular 50 lines where the error happened. After that it was fairly obvious.

Since fixing the bug I've run millions of iterations of a variety of scenarios for as long as 30 minutes with no problems. This evening I'll let it run for a couple of hours. I'll also think up some additional testing scenarios - there are still a few things that I'm not exercising.

Cleaning up the code before sending it to version control I found an entire data structure (a hash map of transactions) that wasn't being used! I was carefully adding and removing from it, but I never actually used it. I must have at some point. So I removed it and everything worked the same. Humorous.

I don't want to be overly optimistic, I'm sure there are still bugs (there always are), but it's starting to feel like I'm over the worst of it.

Wednesday, November 18, 2009

Offsite Sync and Backup

I have a large amount of music (~30gb) and photo files (~300gb). I back them up to my Time Capsule but that wouldn't protect me if my house burnt down. (Photo files from my Pentax K7 are 20mb each and I might take 10,000 in a year - that's 200gb added per year.)

So for an off-site backup, and so I can access them I keep a "mirror" copy on my computer at work. Currently, I update this mirror manually periodically, by copying new files to a portable hard drive and carrying that to work. But this is an awkward solution, and I don't update as often as I should.

There are a variety of backup and sync products out there, but none of them seem to handle this scenario.

I have been using Dropbox to sync my jSuneido files between home and work and laptop and it works really well. But their biggest account is 100gb.

Google's storage is getting cheaper, but Picasa won't let me store my big DNG (raw) photo files.

Jungle Disk has no limit storage, but at $.15 per gb that's roughly $50 per month, which isn't cheap.

Apart from the cost, the big problem with online storage is that uploading 300gb takes a long time. I signed up for Jungle Disk but it estimated 60 days to upload my files! Obviously, after that I'd only have to upload new files, but even a few thousand photos from a long holiday will take days or weeks to upload. Maybe I need a faster internet connection!

CrashPlan has a really interesting approach of letting you backup to other machines, either your own or your friends. This avoids the cost of storage. The upload speed may be better since the machines are local and aren't servicing other users. But CrashPlan doesn't sync, so I'd have an off-site backup, but I couldn't access the files (without restoring them). Another problem with CrashPlan is it requires both machines to be turned on at the same time. But to be environmentally friendly, I try to turn off my computers when I'm not using them.

Note: Jungle Disk only recently added sync and from their forum it sounds like it has problems.

A Proposed Solution

Here is an idea for a new service.

I don't really need a copy of my files in the cloud. If I could sync between my home and work computers that would be sufficient. I don't really want to be paying $50 per month just to store my files in the cloud.

All I really need to store in the cloud is a "summary" of my files (e.g. file names, dates, sizes, maybe hashes) plus any new or modified files. Once the files have propagated to my computers they can be removed from the cloud. If you used a clever hash scheme you keep even do partial updates of large files. (Although for music and photos this isn't that important since the files don't usually change.)

This would require far less storage than keeping a complete copy in the cloud.

You'd still have the problem of the initial syncing. But that could either be done by a different method e.g. a portable hard drive like I've been using, or by requiring both computers to be running at the same time for the initial sync. This is similar to Amazon allowing you to send them physical media to load data into S3. And if you had a big addition of files (like the photos from a long holiday) you could use an alternate method to move them around, and the sync could recognize that you already had the same files on each computer.

The businesses that make money from selling storage probably wouldn't be crazy about this idea, but it seems like a natural addition to CrashPlan since they aren't charging for storage, and charging for the sync service would be additional revenue. And presumably it could be cheap since the storage and bandwidth needs are minimal. (The actual data would be transferred peer to peer.)

You could even borrow some ideas from Git - their "tree" of hash values would work well for this, and also provides security and error checking.

If I had some spare time it would be a fun project. If anyone out there wants to implement it, you can count me in as your first customer :-)

Immutable and Pure

More and more I find myself wanting a programming language where I could mark classes as immutable and functions as pure (no side-effects) and have this checked statically by the compiler. Being able to mark methods as read-only (like C++ const) would also be nice.

This is coming from a variety of sources:
- reading about functional languages like Haskell and Clojure
- working on concurrency in jSuneido (immutable classes and pure functions make concurrency easier)
- problems in my company's applications where side-effects have been added where they shouldn't

I have been using the javax annotation for Immutable, which in theory can be checked by programs like FindBugs and that's a step in the right direction.

There are a lot of new languages around these days, but so far I haven't seen any with these simple features. Of course, in a "true" functional language like Haskell, "everything" is pure and immutable (except for monads), so this doesn't really apply. But I think for the foreseeable future most of us are going to be using a mixture.