Thursday, December 03, 2009

More jSuneido Slogging

Another long tedious day working on getting more of our application tests to run on jSuneido.

These are large scale functional tests, so when one fails it's not easy to figure out why. It could take 5 minutes, it could take 5 hours.

I've found a couple of small bugs in jSuneido. One was because BigDecimal differentiates 1 from 1.0 from 1.00, which makes sense from a scientific precision viewpoint, but not when you're dealing with money. And the problem was actually even more obscure - it was because it differentiates 0 from .0 from .00

But the rest of the bugs (the majority) have been in our application code, either in the tests or in the code itself. Nothing serious, most of them were inadvertent dependencies on the order of unordered things.

But it's frustrating. It would be tedious enough doing all this testing to find bugs in jSuneido. But when I'm doing it to find other people's bugs it's annoying. And of course, as with any large body of code, a lot of it is confusing, hard to understand, and could be improved. (Don't get me wrong, I tend to think the same about my own code.)

Oh well, it's got to be done. Hopefully it doesn't take me too much longer.

Wednesday, December 02, 2009

Systems that Never Stop

An interesting (and entertaining) talk by Joe Armstrong (the principal inventor of Erlang) about writing fault tolerant systems. Well worth watching.

InfoQ: Systems that Never Stop (and Erlang)

Tuesday, December 01, 2009

A Debuggers Life

Another day of debugging, although with a twist - I found as many bugs in our application code as I did in jSuneido. Just minor stuff - there's nothing like multiple implementations of a language to flush out the edge cases.

It seems like a slow process, but the jSuneido bugs do seem to be getting smaller and more obscure, which gives me a certain amount of confidence that the main stuff is ok. Most stuff just works, which is a vast improvement over not long ago.

Tuesday, November 24, 2009

jSuneido Socket Server

Up till now I've been using Ron Hitchens NIO socket server framework. It has worked pretty well, but it's primarily sample code. As far as I know it's not in production anywhere and not really maintained.

The first problem I ran into with it was that it didn't use gathering writes so it was susceptible to Nagle problems. I got around that with setTcpNoDelay, although that's not the ideal solution.

Another problem I ran into was that the input buffer was a fixed size. And worse, it would hang up in an infinite loop if it overflowed. To get around this I made the buffer big, but again, not an ideal solution.

And lastly, everything sent or received had to be copied in or out of buffers maintained by the framework. (rather than used directly)

So I decided to bite the bullet and write my own. It took me about half a day to write. It's roughly 180 lines of code. It's not as flexible as Ron's but it does what I need - gathering writes, unlimited input buffering, and the ability to use the buffers directly without copying. It's fairly easy to use - there's a simple echo server example at the end of the code. I wouldn't want to have to write it with just the Sun Java docs to go by, but with the examples in Ron's book, Java NIO, it's not too bad.

Of course, there may still be bugs in it, but it seems to work well so far.

Thursday, November 19, 2009

jSuneido Back on Track

After my last post I spent a full day chasing my bug with very little progress. Around 7pm, just as I was winding down for the day I found a small clue. It didn't seem like much, but it was nice to end the day on any sort of positive note.

This morning, using the clue, I was able to find the problem. It didn't turn out to be a low level synchronization issue, it was a higher level logical error, although still related to concurrency. That explained the consistency in the error. I had missed one type of transaction conflict, and that meant under certain circumstances one transaction would overwrite another. The fix was easy (two lines of code) once I figured it out.

Even with the clue, it wasn't exactly easy to track down. I ended up digging through a 100,000 line log file. Luckily I wasn't just looking through it, I was searching for particular things. It was a matter of finding the particular 50 lines where the error happened. After that it was fairly obvious.

Since fixing the bug I've run millions of iterations of a variety of scenarios for as long as 30 minutes with no problems. This evening I'll let it run for a couple of hours. I'll also think up some additional testing scenarios - there are still a few things that I'm not exercising.

Cleaning up the code before sending it to version control I found an entire data structure (a hash map of transactions) that wasn't being used! I was carefully adding and removing from it, but I never actually used it. I must have at some point. So I removed it and everything worked the same. Humorous.

I don't want to be overly optimistic, I'm sure there are still bugs (there always are), but it's starting to feel like I'm over the worst of it.

Wednesday, November 18, 2009

Offsite Sync and Backup

I have a large amount of music (~30gb) and photo files (~300gb). I back them up to my Time Capsule but that wouldn't protect me if my house burnt down. (Photo files from my Pentax K7 are 20mb each and I might take 10,000 in a year - that's 200gb added per year.)

So for an off-site backup, and so I can access them I keep a "mirror" copy on my computer at work. Currently, I update this mirror manually periodically, by copying new files to a portable hard drive and carrying that to work. But this is an awkward solution, and I don't update as often as I should.

There are a variety of backup and sync products out there, but none of them seem to handle this scenario.

I have been using Dropbox to sync my jSuneido files between home and work and laptop and it works really well. But their biggest account is 100gb.

Google's storage is getting cheaper, but Picasa won't let me store my big DNG (raw) photo files.

Jungle Disk has no limit storage, but at $.15 per gb that's roughly $50 per month, which isn't cheap.

Apart from the cost, the big problem with online storage is that uploading 300gb takes a long time. I signed up for Jungle Disk but it estimated 60 days to upload my files! Obviously, after that I'd only have to upload new files, but even a few thousand photos from a long holiday will take days or weeks to upload. Maybe I need a faster internet connection!

CrashPlan has a really interesting approach of letting you backup to other machines, either your own or your friends. This avoids the cost of storage. The upload speed may be better since the machines are local and aren't servicing other users. But CrashPlan doesn't sync, so I'd have an off-site backup, but I couldn't access the files (without restoring them). Another problem with CrashPlan is it requires both machines to be turned on at the same time. But to be environmentally friendly, I try to turn off my computers when I'm not using them.

Note: Jungle Disk only recently added sync and from their forum it sounds like it has problems.

A Proposed Solution

Here is an idea for a new service.

I don't really need a copy of my files in the cloud. If I could sync between my home and work computers that would be sufficient. I don't really want to be paying $50 per month just to store my files in the cloud.

All I really need to store in the cloud is a "summary" of my files (e.g. file names, dates, sizes, maybe hashes) plus any new or modified files. Once the files have propagated to my computers they can be removed from the cloud. If you used a clever hash scheme you keep even do partial updates of large files. (Although for music and photos this isn't that important since the files don't usually change.)

This would require far less storage than keeping a complete copy in the cloud.

You'd still have the problem of the initial syncing. But that could either be done by a different method e.g. a portable hard drive like I've been using, or by requiring both computers to be running at the same time for the initial sync. This is similar to Amazon allowing you to send them physical media to load data into S3. And if you had a big addition of files (like the photos from a long holiday) you could use an alternate method to move them around, and the sync could recognize that you already had the same files on each computer.

The businesses that make money from selling storage probably wouldn't be crazy about this idea, but it seems like a natural addition to CrashPlan since they aren't charging for storage, and charging for the sync service would be additional revenue. And presumably it could be cheap since the storage and bandwidth needs are minimal. (The actual data would be transferred peer to peer.)

You could even borrow some ideas from Git - their "tree" of hash values would work well for this, and also provides security and error checking.

If I had some spare time it would be a fun project. If anyone out there wants to implement it, you can count me in as your first customer :-)

Immutable and Pure

More and more I find myself wanting a programming language where I could mark classes as immutable and functions as pure (no side-effects) and have this checked statically by the compiler. Being able to mark methods as read-only (like C++ const) would also be nice.

This is coming from a variety of sources:
- reading about functional languages like Haskell and Clojure
- working on concurrency in jSuneido (immutable classes and pure functions make concurrency easier)
- problems in my company's applications where side-effects have been added where they shouldn't

I have been using the javax annotation for Immutable, which in theory can be checked by programs like FindBugs and that's a step in the right direction.

There are a lot of new languages around these days, but so far I haven't seen any with these simple features. Of course, in a "true" functional language like Haskell, "everything" is pure and immutable (except for monads), so this doesn't really apply. But I think for the foreseeable future most of us are going to be using a mixture.

Tuesday, November 17, 2009

To Laugh or To Cry?

I sat down this morning to write more concurrency tests for jSuneido, fully expecting to uncover more bugs. Amazingly, everything worked perfectly. I have to admit I was feeling pretty darn good, I was almost ready to claim victory. But as the saying goes, pride goes before a fall.

It was time for coffee so I figured I might as well let the tests run for a longer period. I came back to find ... the exact same error I've been fighting for the last week or more! I wouldn't have been surprised to uncover different bugs, but I could have sworn I had squashed this one.

It's bizarre that I keep coming back to this exact same error. I would expect concurrency errors to be more random. Even for a single bug I would expect it to show a variety of symptoms. I guess I shouldn't be complaining, consistency is often helpful to debugging.

I've obviously reduced the frequency of occurrence of the error. I just hope I can get the error to occur in less than 10 minutes of testing. Otherwise it's going to be a very slow debug cycle and I'll have lots of time to review the code!

So am I laughing or crying? Mostly laughing at the moment, but ask me again after I've spent a bunch more hours (or, heaven forbid, days) struggling to find the problem.

Monday, November 16, 2009

How Can This Be Acceptable?

I recently downloaded the latest version of the Scite programming editor. And subsequently, every time I ran it I got Windows security warnings. There's a check box that implies it will let you stop these warnings, but as far as I can tell it has no effect. I have no idea why the previous version ran without any warnings.

I eventually got these instructions to work:
1.. Right-click the file and select Properties.
2.. Click on the Security tab.
3.. Click Advanced in the lower right.
4.. In the Advanced Security Settings window that pops up, click on the Owner tab.
5.. Click Edit.
6.. Click Other users or groups.
7.. Click Advanced in the lower left corner.
8.. Click Find Now.
9.. Scroll through the results and double-click on your current user account.
10.. Click OK to all of the remaining windows except the first Properties window.
11.. Select your user account from the list up top and click Edit.
12.. Select your user account from the list up top again and then in the pane below, check Full control under Allow, or as much control as you need.
13.. You'll get a security warning, click Yes.
14.. On some files that are essential to Windows, you'll get a "Unable to save permission changes. access is denied" warning and there's nothing that you can do about it to the best of my knowledge.
15.. Reconsider why you're using Windows.
At my count, that's 7 levels of nested dialogs. And my name didn't show up in the list for step 12 so I had to Add "APM\andrew" (obviously, users would know to type that). Who designs this stuff? Who reviews it? Microsoft is supposed to hire all these really smart people, but they still seem to produce a lot of stupid stuff.

Sunday, November 15, 2009

jSuneido Success

As I hoped, once I had a small failing test it didn't take too long to find the problem and fix it. It didn't make me feel too stupid (at least no more than the usual when you figure out a bug) since it was a fairly subtle synchronization issue. Have I ever mentioned that concurrency is hard?

The funny (in a sick way) part was that after all that, I still had the original problem. Ouch.  Obviously, the problem I isolated and fixed wasn't the only one.

Pondering it more I realized that the bugs I'd been chasing were all originating from a certain aspect of the design. And I realized that even if I managed to chase them down and squash them, that it was still going to end up fragile. Some future modification was likely to end up with the same problem.

So I reversed course, deleted most of the code I wrote in the last few days, and took a simpler approach. Not quite as fast, but simplicity is worth a lot. It only took a half hour or so to make the changes.

Amazingly, all the tests now pass! It took me a minute to grasp that fact. What does that mean when there are no error messages? Oh yeah, that must mean it's working - that's weird.

I'll have to write a bunch more tests before I feel at all confident that it's functional, but this is definitely a step in the right direction. I feel a certain amount of reluctance to start writing more tests - I'd like to savor the feeling of success before I uncover a bunch more problems!

The Joy of a Small Failing Test

Up till now I could only come up with small tests that succeeded and large scale tests that failed.

What I needed was a small test that failed. I finally have one. And even better, it actually fails in the debugger :-)

It's not so easy to come up with a small failing test because to do that you have to narrow down which part of the code is failing. Which is half the challenge, the other half is to figure out why it's failing.

At least now I feel like I have the beast cornered and it's only a matter of time before I kill it.

The test is simple enough that I look at it and think "this can't fail". But it is failing, so obviously I'm missing something. I just hope it's not something too obvious in hindsight because then I'll feel really stupid when I find it.

Saturday, November 14, 2009

jSuneido Progress

The good news is that I've fixed a number of bugs and came up with a reasonable (I think) solution for my design flaw. The solution involved the classic addition of indirection.[1] Of course, it's not the indirection that is the trick, it's how you use it.

The bad news is that after I'd done all this, I was still getting the original error! It only occurs about once every 200,000 transactions (with 2 threads). (Thank goodness for fast computers - 200,000 transaction only takes about 5 seconds.) Frustratingly, it doesn't happen in the debugger. With this kind of problem it's not much use adding print statements because you get way too much irrelevant output. A technique I've been finding useful is to have each transaction keep a log of what it's doing. Then when I get the error I can print the log from the offending transaction. It's not perfect because with concurrency problems you really need to see what the other thread was doing, but it's better than nothing.

It was also annoying because it was the end of the day so I had to leave it with a known error :-(

Thinking about it, I realized I had rushed coding some of the changes, hadn't really reviewed them, and hadn't written any tests. Not good. When I went back to it this morning, sure enough I had made mistakes in my rush job. Obviously, that self imposed pressure to get things resolved by the end of the day is not always a good thing.

So now I'll go back and review the code and write some tests before I worry about whether I've fixed the original problem.
1. A famous aphorism of David Wheeler goes: All problems in computer science can be solved by another level of indirection;. Kevlin Henney's corollary to this is, "...except for the problem of too many layers of indirection." - from Wikipedia

Wednesday, November 11, 2009

jSuneido Multi-Threading Issues

It didn't take much testing to find something that worked single-threaded but failed multi-threaded.

I was expecting this - I figured there'd be issues to work out.

But I was expecting them to be hard to track down and easy to fix and it turned out to be the opposite - easy to track down but hard to fix.

The problem turned out to be more a design flaw than a bug. I've thought of a few solutions but I'm not really happy with any of them.

Oh well, I knew all along this wasn't going to be easy. It'll come.

Monday, November 02, 2009

IntlliJ IDEA Goes Open Source

I recently learned that IntelliJ has released a free, open source community edition of their IDE.

IntelliJ is one of the main IDE's along with Eclipse and Netbeans. I hadn't looked at it much because the other two are free, but it does get some good reviews. (Apparently they did offer free licenses to open source projects but I wasn't aware of that.)

I tried downloading it and installing it and had no problems. It comes with Subversion support "out of the box" and I was easily able to check out my jSuneido project. That's more than I can say for Eclipse where it's still a painful experience to get Subversion working (at least on a Mac). IntelliJ proves that it is possible to do it smoothly.

I haven't had time to play with it much yet. My first impression was that the UI was a little "rougher" than Eclipse. I can probably tweak the fonts to get it a bit closer. Maybe it's due to Eclipse using SWT. (I'm not sure what IntelliJ is using.)

IntelliJ is known for their strong refactoring. To be honest, I only use a few basic refactorings in Eclipse (like rename and extract method) so I don't know if this would be a big benefit. I should probably use more...

IntelliJ is also supposed to have the best Scala plugin. I'll have to try it. I tried the Eclipse one but wasn't too impressed with where it's at so far.

Friday, October 30, 2009

jSuneido Progress

Over the last couple of weeks I've been working on database concurrency in jSuneido. I basically ripped out all the transaction handling code that I ported from cSuneido and replaced it with a more concurrency friendly design.

The tricky part was keeping the code running (tests passing) during the major surgery. It's always tempting to just start rewriting and hope when you finish that it'll all work. Of course, it never does and you're then faced with major debugging because you have no idea what's broken or which part of your changes broke it.

Evolving gradually isn't so easy either, but at least there's no big-bang integration nightmare to face at the end. As you evolve the code some of the intermediate stages aren't pretty since you have two designs cohabiting. And that can lead to some ugly bugs due to the ugly code. But if you're making the changes in small steps, you can always just back up to a working state.

A few days it was touch and go whether I'd get that day's changes working by the end of the day but thankfully I always did. I always hate leaving things broken at the end of the day!

There were a few uneasy moments when things weren't working and I started to wonder if there was some flaw in the design that would negate the whole approach. But the bugs always turned out to be relatively minor stuff. (Like transactions conflicting with themselves - oops!)

The new design doesn't write any changes to the database file till a transaction commits. This eliminated code that had to determine if data should be visible to a given transaction, and code that had to undo changes when the transaction aborted and rolled back. It does mean accumulating more data in memory, but memory is cheap these days. And simpler code is worth a lot.

Switching to serializable snapshot isolation, as described in Serializable Isolation for Snapshot Databases turned out pretty well. You still need to track reads and writes, but the advantage is that conflicts are detected during operations, rather than having to do a slow read validation process during commit (Especially since, at least in my design, commit is single threaded.) It was also nice to see that this approach is a bit more permissive than my previous design i.e. allows more concurrency.

It was exciting (in a geek way) to finally get to the whole point of this exercise - making Suneido multi-threaded. It's taken so much work to get to this point that you almost forget why you're doing it.

I've also gradually been making things thread-safe where needed. My approach to concurrency has been:

  • keep as much data thread contained as possible, i.e. minimize shared data
  • immutable data where possible
  • persistent immutable data structures for multi-version (the equivalent of database snapshot isolation)
  • concurrent data structures where applicable e.g. ConcurrentHashMap
  • Java synchronized only for bottom level code that doesn't call any other application code (to avoid the possibility of deadlock and livelock)

The real test will be how scalable jSuneido is over multiple cores. My gut feeling is that the design should scale fairly well, but I'm not sure gut feelings are reliable in this area.

It's a funny feeling to be finally approaching the end of this project. There's still lots to do, but the big stuff is handled.

Tuesday, October 27, 2009

The First Mistake in Documentation

From the Java 6 docs for Buffer:


public final int capacity()
Returns this buffer's capacity.
The capacity of this buffer
And no, it's not an isolated case. It's followed by "position - return this buffer's position"

I don't know how many times I've seen stuff like this. Why do people write this kind of thing? They must know it's useless. Is it just because they are "required" to write documentation so they fulfill the letter of their requirements? Maybe IDE's are too good at generating boilerplate which then gets left as generated.

Comments like "x = x + 1 // add one to x" are a similar phenomenon

Saturday, October 24, 2009

When Will I Learn

I had an annoying intermittent bug in jSuneido that I narrowed down to a problem with my PersistentMap. I eventually tracked it down by writing a random "stress" test.

The "funny" part is that I knew I should have done this right from the start. In my previous blog post I said:
I should probably do some random stress testing to exercise it more.
As is often the case, the fix was tiny - 3 characters!  And, of course, it was a situation that was relatively rare. It was a good example that 100% test coverage does not mean no bugs. This kind of thing is one of the downsides of rolling your own.

Here's the fixed source code. The bug was missing "<< h" on line 202

Friday, October 23, 2009

Give Them Something Better

 This quote was talking about movie theaters, but it could easily be talking about software.
"Giving the people what they want is fundamentally and disastrously wrong. The people don't know what they want ... [Give] them something better" - Samuel "Roxy" Rothapfel, 1914 (quoted in Blue Ocean Strategy)
Of course, the danger is going too far to the other extreme, thinking you know better, and ignoring peoples real needs.

Thursday, October 22, 2009

Eclipse Annoyance

Have I mentioned that the auto-indent/wrapping in Eclipse really sucks!

            insertExistingRecords(tran, columns, table, colnums,
                    fktable, fkcolumns, btreeIndex);

I know formatting is somewhat subjective, but I can't see how this kind of result can be called "correct" under any interpretation. The frustrating part is that you fix it, and next time you turn around, it's mangled again.

I cannot understand how the Eclipse developers tolerate this. Surely somewhat would get annoyed enough to fix it. The only thing I can guess is that they don't use it. I wonder why. Maybe I'm the idiot for leaving it turned on!

To be fair, maybe it works for everyone else and it's something specific to my settings that messes it up. If so, I haven't been able to find the magic combination of settings that "works".

Scalability Isn't Easy

How We Made GitHub Fast - GitHub

Tuesday, October 20, 2009

Another Unavailable eBook Reader

Barnes & Noble has announced their eBook reader. It looks pretty nifty, but of course, it's not available in Canada. (Only US currently.)

Nook, eBook Reader, eBook Device - Barnes & Noble

There are some interesting features like being able to loan books to other people, and being able to read your ebooks on your computer.

So far, the Sony eBook Readers are the only mainstream reader available in Canada.

The Sony eBook store claims about 55,000 books versus Amazon's 350,000. Barnes & Noble claims "over one million" but they must be counting all the public domain stuff. I'd be surprised if they have as good a selection as Amazon.

I wonder if you'd get away with shipping a Nook to a friend's US address. The question is whether they'll let you use a Canadian credit card to pay for books (Amazon doesn't).

Monday, October 19, 2009

Back to No Kindle in Canada

I just tried to order the new international version of the Kindle and got:

When the announcement first came out I specifically went and checked Canada and I'd swear it said it was available. (my previous post)

It really annoys me to continually run into products and services that exclude Canada. I don't think Amazon has anything personal against Canada, so I have to assume it's our industry and government that are blocking it. Stupid and short sighted.

Canada snubbed as Kindle goes global - The Globe and Mail

Wednesday, October 14, 2009

A Java Immutable Persistent List

Something else I needed for jSuneido was an immutable PersistentList (source). I also improved my PersistentMap (source).

One of the things that was bothering me was that I thought there should be a more efficient way to initially build immutable persistent lists and maps. Creating a new version for every element added seemed like overkill.

When I started looking at the Google Collections Library I found the Builder classes they have for their immutable collections. This seemed like a good approach so I added builders to PersistentList and PersistentMap. (The Guava: Google Core Libraries for Java 1.6 also look interesting.)

To be clear, if you just need a thread safe map or list, the java.util.concurrent ones are just fine. I'm using them in a few places.

And if you just need an immutable collection java.util collections offer the unModifiable wrappers and the Google Collections Library has immutable collections. However, they are not "persistent" (see Language Connoisseur). They don't offer any easy way to create new versions with added or removed elements.

Persistent immutable data structures really shine if you need to make modified versions but still retain full access to previous versions. That's what I need in the jSuneido database code because it implements multi-version concurrency control. Each transactions sees the database "as of" a certain point in time. Persistent immutable data structures are a great fit for this.

Java ByteBuffer Annoyances

jSuneido uses ByteBuffer's to memory map the database file, but they're not ideal for this. Unfortunately, they're the only option.

Java NIO Buffer's are meant to be used in a very particular way with methods like mark, reset, flip, clear, and rewind. For the purpose they are designed for, I'm sure they work well.

But I'm not sure that shoehorning the memory mapping interface into ByteBuffer was the best design choice.

jSuneido memory maps the database file in big chunks (e.g. 4mb). When you want access to a particular record it returns a new ByteBuffer that is a "slice" of the memory map one. This is the closest I can get to cSuneido returning a pointer. The problem is that slice doesn't take any arguments - you have to set the position and limit of the buffer first. Which means the operation isn't thread safe because a concurrent operation could also be modifying the position and limit.

Most operations on buffers have both "relative" (using the current position) and "absolute" (passing the position as an argument) versions. For some reason slice doesn't have an absolute version.

Another operation that is missing an absolute version is get(byte[]) so again you have to mess with the buffer's current position, meaning concurrency issues.

It would have been much better (IMO) if ByteBuffer was immutable (i.e. all absolute operations). All the position/mark/limit reset/clear/flip stuff should have been in a separate class that wrapped the base immutable version. But not only does the current design force you to drag along a bunch of extra baggage, but it also (for some unknown reason) omits key operations that would let you use it as immutable.

Considering that one of the goals of java.nio was concurrency, it seems strange that they picked a design that is so unfriendly to it.

Monday, October 12, 2009

jSuneido Performance

I was troubled by jSuneido appearing to be 10 times slower than cSuneido so I did a little digging.

The good news is that with a few minor adjustments it's now only about 50% slower. That takes a load off my mind.

The adjustments included:

- using the server VM (adding -server to the JRE options)

- disabling asserts (I had some slow ones for debugging)

- removing dynamic loading of classes (if a global name wasn't found it would try to class load it)
- recording when a library lookup fails to avoid doing it again

The C++ Suneido code makes a point of avoiding copying data as it moves from the database to the language. For example, a string that comes from a database field actually points to the memory mapped database - zero copying. Of course, I never knew how much benefit this was because I had nothing to compare it to. It just seemed like it had to be beneficial.

In Java it's a lot harder to avoid copying. To get a string from the database, not only do you have to copy it twice (from a ByteBuffer to a byte array and then to a String) but you also have to decode it using a character set. I could avoid the decoding by storing strings as Java chars (16 bit) but that would double the size in the database.

There may be better ways to implement this, I'll have to do some research. Another option would be to make the conversion lazy i.e. don't convert from ByteBuffer to String until the string is needed. Since a certain percentage of fields are never accessed, this would eliminate unnecessary work.

One small benchmark I ran (admittedly not representative of real code) spent 45% of its time in java.nio.Bits.copyToByteArray!  Yikes, almost half the run time! Judging by this, my efforts to avoid copying in cSuneido were probably worthwhile.

A strange result from this same benchmark was that 25% of the time was in String.intern. I found this hard to believe but it seemed consistent. I even tried the YourKit profiler and got the same result. (Although if it is using the same low level data collection that would explain it.) But if I did a microbenchmark on String.intern alone it was very fast, as I would expect. (I checked the profiler output to make sure the intern wasn't getting optimized away.)

I don't know if this is an artifact of the profiling, or if intern slows down when there are a lot of interned strings, or if it slows down due to some interaction with the other code. I did find some reports of intern related performance problems but nothing definitive.

jSuneido currently interns method name strings so that it can dispatch using == (compare reference/pointer) instead of equals (compare contents). Most cases e.g. object.method() are interned at compile time. (Java always interns string literals.) But some cases e.g. object[method]() are still interned at run time.

On larger, more realistic tests intern doesn't show up so I'm not going to worry about it for now.

Amazon Kindle International

It looks like we'll be able to get the Kindle in Canada soon. Kindle Wireless Reading Device (6" Display, U.S. & International Wireless, Latest Generation): Kindle Store

I've been getting close to buying a Sony reader, now I'll have to choose:

- touch screen
- more open formats e.g. ePub
- access to public domain Google Books
- access to some public libraries

- bigger selection of books
- free 3G wireless
- access to Wikipedia
- purchased books can also be accessed on iPhone or iPod Touch

So far it's only the smaller Kindle that's going to be available internationally.

One thing it would be good for is computer books. I hate buying them because I know they'll be out of date within a year or two. But you can't get most of them from the library, and even if you could, often I need to refer back to them for longer than I could borrow them from the library. And it would be great to not have to haul physical books back and forth between home and office. Environmentally, you save trees, but you consume another gadget, which wouldn't be so bad if it lasted longer, but you know it'll be out of date in a year or two.

Quick and Dirty Java Profiling

Search for "Java Eclipse profiler" and one of the first things you'll find is The Eclipse Test & Performance Tools Platform. I installed it, but when I went to use it, it told me my platform wasn't supported. I'm not sure what that meant - 64 bit? OS X? Java 1.6? Eclipse 3.4? I didn't pursue it, life's too short.

I looked at a few more tools but nothing stood out. There's the YourKit profiler but it's a commercial product. (Although maybe free for open source projects?)  Eventually I found an article on using the built-in hprof. That sounded simple enough.

To use hprof you add something like -agentlib:hprof=cpu=samples to your Java command line. In Eclipse you can do this from Eclipse > Preferences > Java > Installed JREs > Edit. It's crude to edit the preferences to turn this on and off, but ok for limited use. (There may be a better way to do this, I'm still far from an Eclipse expert.)

This will produce a java.hprof.txt file with the results. It's fairly self explanatory.

There are more options for hprof (you can get a list with -agentlib:hprof=help)

Sunday, October 11, 2009

jSuneido Passes Accounting Tests

Another good milestone - jSuneido now successfully runs all the tests from our accounting application.

The further I go, the tougher the bugs tend to get. One of them took me two days to track down and fix. The problems were in predictable areas like obscure details of database rules and math. Some of them were not so much bugs as just small incompatibilities with cSuneido. A few were errors in the tests themselves.

I've got more application tests I can run. Hopefully they won't uncover too many more problems.

On a less positive note, jSuneido is taking almost 10 times longer than cSuneido to run the accounting tests. That's a significant difference. I haven't done any optimizing yet, but I wouldn't expect optimizing to make a 10 times difference.  It may be time to find a profiler and see what's taking all the time.

The stdlib tests run in similar amounts of time on cSuneido and jSuneido so I suspect the difference is in the database - the accounting tests use the database a lot more than the stdlib tests. cSuneido has a thinner, more direct interface to the memory mapping, which may make a fair difference in speed. I should be able to write some tests to see if this is the problem. Of course, if it is, I'm not sure how I'm going to fix it ...

Thursday, October 08, 2009

A Java Immutable Persistent Map

I know this is going to seem like I'm reinventing the wheel again, but I honestly tried to find existing Java code for a persistent map. I also took a stab at extracting something usable from Clojure but it was not easy to untangle from the rest of the Clojure code. I does seem surprising that there isn't anything out there (at least easily findable). I guess Java programmers don't write functional code.

It took me most of a day and about 300 lines of code to implement, using the Ideal Hash Trees paper and the occasional glance at the Clojure code to see how it did things. I took a similar approach as Clojure but simplified somewhat. I also made mine compatible with the Java collections classes. And mine doesn't depend on anything other than the standard Java libraries, so it should be a useful starting point for anyone else.

I started implementing iteration but gave up for lack of time. (And YAGNI - I may not need it.) Iterating through trees is always a pain.

The code is not as clean as I'd like, e.g. the methods could be smaller, but I did write tests with pretty much 100% coverage. I should probably do some random stress testing to exercise it more. And there are some magic numbers in there like 0x1f and 5 and 32.

I didn't find it right away so I'll point out that if you're looking for CTPOP (count population) in Java it's Integer.bitCount

I tried to restrain myself from premature optimization and took the simplest approach I could. I'm sure it could be made to use less memory and run faster. But the algorithm is good, so speed should be reasonable. And it will certainly use less memory than naive copy on write.

It was actually a nice break from slogging away getting our accounting application tests to run on jSuneido.

PS. I realize that Git uses a persistent data structure, which is why it is so "cheap" to make new versions. I started implementing a Git-like system in Suneido a while ago, but at that point I hadn't run into persistent data structures. But tree data structures are not strangers anyway due to Suneido's btree database indexes.

Monday, October 05, 2009

Building the Perfect Beast

I just watched a good presentation by Rich Hickey on Persistent Data Structures and Managed References.

In it he mentions that Clojure's software transactional memory (STM) doesn't do read tracking.

That caught my attention. Read tracking (and validation) is one of the big hassles in multi-version concurrency like in Suneido's database.

I thought maybe there was a way to avoid it that I'd missed so I did some digging. Sadly, what I discovered is that Clojure's STM only implements snapshot isolation (SI).

This means you can still get "write skew" anomalies where multiple concurrent update transactions each write data that the other reads, leading to results that would not (could not) happen if the transactions were serialized.

Suneido implements serializable transactions, not just snapshot isolation, to prevent these kinds of anomalies. (I like how they call it "anomalies", it doesn't sound as bad as "errors".)

Clojure implements snapshot isolation because it's simpler, easier, faster, etc.

Databases like Oracle and PostgreSQL supply snapshot isolation when you request serializable, again, for performance reasons. Amusingly, PostgreSQL says "Serializable mode does not guarantee serializable execution..."

It reminds me of the old saying "if it doesn't have to work correctly, you can make it as small/fast/easy as you want".

But while I was digging into this, I found that there is a way to make snapshot isolation serializable using commit ordering. Wow, if there is a way to avoid read tracking/validation and still be serializable that seems like the best of both worlds.

I found a paper on Serializable Isolation for Snapshot Databases but if I understand it correctly from a quick read, it simply replaces read tracking with a new kind of read lock. I have to study it a bit more to figure out the advantage. I think it may keep the special read locks for a shorter period of time than read tracking. And I think it will detect potential anomalies earlier, avoiding the read validation phase at commit time.

But this paper doesn't seem to have anything to do with commit ordering so I'm unsure if that's yet another approach or whether I'm just missing the connection.

Sometimes I think my father was right, that I should have become an academic so I could spend all my time on this kind of thing. But I think that's a fantasy. First there's all the politics in academia (that I would hate). Then there's the need to specialize so much (whereas I like to jump around). And finally, I like to have a practical end to what I'm doing - actual real users benefiting, not just a paper published in some journal to be read by other academics.

* Building the Perfect Beast is a Don Henley song

Snow Leopard Technology

Mac OS X 10.6 Snow Leopard: the Ars Technica review - Ars Technica

There's some good stuff in this article - 64 bit, LLVM, Clang, concurrency, GCD, etc.

Here's a quote I can relate to.
"The prospect of an automated way to discover bugs that may have existed for years in the depths of a huge codebase is almost pornographic to developers—platform owners in particular."

Saturday, October 03, 2009

Write Your Own Compiler

I recently listened to a podcast of Scott Hanselman talking to Joel Spolsky.

One of the things they talked about was how Fog Creek (Spolsky's company) wrote their own compiler because it was easier than rewriting their application, which was written in an ancient version of VBScript.

I've always been a little defensive about how my company writes their applications in our own language (Suneido). At best people look at me funny, at worst they think I'm crazy. Our salesmen have to skirt around the issue because it scares people to hear you're not using big name tools. As the old saying goes, "no one gets fired for buying IBM". Nowadays you can substitute Microsoft or Oracle or Java.

So it was heartening to hear Spolsky talk about (aka defend) why they wrote their own compiler. One of his points was that writing a compiler is not that hard. People tend to think it's a big deal, but for a smallish language it's not that big a job. For example, Suneido's compiler makes up a tiny fraction (less than 1%) of the total source code in my company's main application.

I'm the first to admit that part of the reason is simply that I like developing languages (and database servers, and IDE's, and frameworks) better than I like writing applications.

But there are other reasons. Having your own platform has its costs, but it also has its benefits. It's a good way to insulate yourself from the fast pace of change in the computer industry. My company still has customers running software that we originally wrote on MS-DOS. The exact same application code is now running on Vista. There are not many commercial platforms that can claim that. Even things like Java that are supposed to be stable platforms change a lot over the years. And Microsoft changes languages and frameworks faster than you want to keep up with. (Spolsky's problem.)

Of course, that can mean you're not using the latest bleeding edge technology, but that's not such a bad thing for a business application.

Friday, October 02, 2009

Thursday, October 01, 2009

Language Connoisseur

It took me a few tries to come up with the right word. Addict? Fanatic? Polyglot? Aficionado?

I think "connoisseur" fits best - "A person of informed and discriminating taste e.g. a connoisseur of fine wines."

I love to read about different programming languages, but I wouldn't say I actually "learn" them. I've only written significant amounts of code in three languages, C, C++, and now Java. It takes years for me to feel like I'm getting "good" at a language.

But I can "taste" other languages, and like a wine connoisseur, critique them (at least in my own mind). Ah yes, made from JVM grapes, definite functional nose, aged in strong typing, goes well with concurrent meals. (Sorry, I'm stretching, but you get the idea.)

It's interesting to note that 4 of the 6 languages that I've read about recently are JVM based. That's partly because I'm interested in JVM based languages because of jSuneido. But I think it also indicates a trend. There are languages based on .Net as well but for some reason they don't seem to be as popular, or at least have as many books.

One of my big interests these days is concurrency, again partly because of having to face it with jSuneido. But it's also something we all have to face as cpu's no longer get faster, and instead we get multiple cores.

It's becoming pretty obvious that threads, mutable data structures, and locking are not the answer. They work, and with a lot of effort it's possible to write programs using them, but it's hard and error prone. I think they will become the "machine language" of concurrency. Present under the covers but not dealt with directly by most people.

So the alternatives in other languages are pretty interesting. Erlang and Scala have "actors". Clojure has software transactional memory.

Pure functions and immutable objects seem to be part of the answer. Transactional memory is promising. All three of these go together.

When I first started reading about Clojure I was confused by the references to "persistent data structures". I was thinking of persistence as saving things in a database. But that's not what they're talking about. Persistent data structures  in this sense are data structures that are optimized for making modified immutable "copies" that share representation. The classic and simplest example is a single linked list (like in Lisp). For example, when you add a new item to the front of a list the new list "shares" the old list.

Suneido strings are immutable and use persistent data structures. Unlike most languages, when you concatenate onto a string in Suneido it doesn't create a whole new string. Instead, behind the scenes, it creates a list of two strings. This is transparent - to the programmer it's just as if it created a new string.

This answered a question that had been lurking in the back of my mind. Isn't it really slow to use large immutable objects in functional languages? Now I can see how it can be done efficiently using some clever algorithms like Clojure's persistent hash map.

I'd been trying to figure out how to implement an efficient concurrent version of the schema cache in jSuneido. Suneido stores the database schema in tables in the database. But for speed it caches it in memory. The schema doesn't tend to change very often so it is well suited to making it immutable (and therefore not requiring locking). But it does change occasionally, and when it does, transactions have to see the appropriate version. (Suneido uses multi-version database transactions.) A persistent data structure is perfect for this because each transaction can have it's own immutable "copy" without physically having to copy the schema (which can be quite large).

However, there's a catch. cSuneido loads the schema lazily (on demand). This adds a twist - the schema cache is logically immutable, but physically mutable. That would require locking, which I'm trying to avoid.

Then I realized that lazy loading was to speed up start up. But jSuneido is primarily (at least initially) aimed at the server, where start up speed is not as important. So I can just load the whole schema at start up, eliminating the problem. Nice.

There is another catch. Java doesn't come with any persistent data structures :-(  and I don't particularly want to write my own.  Java doesn't even have a single linked list.

I did a quick search but I didn't find any third party libraries either.

One possibility is to use the Clojure data structures. (they're written in Java) I took a quick look at the source code and it seems like it might be feasible but I'm not sure how dependent they are on other parts of Clojure. If I'm lucky they'll be usable on their own.

Of course, adopting other code always has it's price. If there are bugs I won't know how to fix them. If Clojure fixes, updates, or improves the code I'll have to decide whether to switch to the new version.

Fun stuff! :-)

Tuesday, September 29, 2009

Updated Blogger Editor

If you're not using the new Blogger post editor I'd recommend switching. It's a lot nicer than the old one. I've been using it for a while and haven't noticed any problems with it.

An overview of the new post editor - Blogger Help

Burnt by Java assert Again

I was debugging a problem in jSuneido and I put in an assert that I was pretty sure would fail. It didn't.

I changed it to "assert false" just to be sure. It still didn't fail. WTF

I went back to my last blog post about this (Don't Forget to enable Java assert) because I couldn't remember how I'd fixed this. (Half the reason I blog is so I have a record for my own use!)

Oh yeah, I'd set the Default VM Arguments (to -ea). Why did that get lost?

Because I installed Snow Leopard, which finally came with a 1.6 JRE, which I'd switched to using.

Easy enough to add the setting again, but how can I avoid getting burnt by this again?

Of course, the same way you avoid a bug recurring - by writing a test.

My first attempt was the standard way to test that something throws:

        try {
            assert false;
            fail("assert not enabled");
        } catch (AssertionError e) {

But that succeeded with or without the -ea setting!?  It took me a few minutes of head scratching to realize that fail also throws AssertionError so I had to do:

        try {
            assert false;
        } catch (AssertionError e) {
        fail("assert not enabled");

At least now I'll know right away (or at least the first time I run the tests) if I lose assert again.

I still think the default should be to enable assert, at least for development. (The higher performance server VM could disable assert by default.) Even if Java itself didn't do this, Eclipse could.

Sunday, September 27, 2009

How to Suppress Blogger Image Borders

For some reason Blogger often adds a gray border around images. Usually I don't want it. You can get rid of it by adding:

style="border: none;"

To the img tag (not the a or div tags) using Edit HTML.

Nice UI Feature on Amazon

Up till recently on the Add to Wishlist button would always add to your default list. If you wanted the item in a different wish list you then had to move it with a second step.

Now they've made the obvious improvement, you can pick which list to add the item to right from the start, avoiding the second step.

Buttons with pull down options are becoming quite common. I think they're a good way to handle this kind of thing.

Of course, I'm never satisfied! Next I'd like to be able to add to wish lists directly from the search results rather than having to go to the items page first.

Thursday, September 24, 2009

Tuesday, September 22, 2009

Sunday, September 20, 2009

When Will We Get Computers That Don't Crash?

I just had Mac OS X crash totally, twice in a row. I suspect it was downloading some video podcasts in iTunes that did it (at least that's what's in common to the two crashes).

I can understand applications crashing. But surely to goodness by now we should have operating systems that can handle misbehaving applications. Don't we have hardware process isolation? Do I have to run my applications in VM's just to get sufficient isolation? (not that that is complete protection either!)

Similarly, people continue to give (and accept) misbehaving hardware as an excuse for systems crashing. Again, surely to goodness the operating system could be written to survive misbehaving hardware/drivers. Especially when most hardware interfaces in a standard way through USB. I've crashed my Mac importing photos. People say, "oh yeah, there are problems reading SD cards via USB". Why is that "acceptable"? Surely enough time should have been spent on the USB driver to make it crash proof?

And even if a device driver has obscure bugs (they always will), why is this able to take down the whole operating system?

I don't understand why the state of the art hasn't progressed further than this.

Tuesday, September 15, 2009

jSuneido Milestone

As of this morning, all the standard library tests run successfully in jSuneido!
(Other than a few that are not portable e.g. because they call Win32 functions.)

In some respects that's a huge milestone. In other respects it's just another arbitrary point in the progress of the project.

I'm pretty sure (I'd bet money on it) that if (when) I run other tests, e.g. the test suite from our accounting package, that I'll uncover more bugs. That's the next step but I'm procrastinating so I can enjoy the feeling of accomplishment a little bit longer.

Hopefully getting the other tests to pass won't take too long. Then I can finally tackle one of the main points of this project - multi-threading. jSuneido is currently multi-threaded but with no protection against threads interfering with each other. So using it multi-user will crash it pretty quickly.

One of the steps is to identify shared data structures and replace/convert them to concurrent versions.

But doing that naively is not going to work well because there will be too many bottlenecks that will end up serializing execution. So it'll be safe, but it won't scale.

So I'm going to have to redesign a few areas but I think I know what to do and it shouldn't be too hard.

Windows 7 on Parallels on Mac

I just installed Windows 7 (Professional 32 bit) on Parallels (4.0.3846) on my iMac (OS X 10.6.1)

The install went smoothly and relatively quickly.

You still can't use Aero (fancy effects like transparency) under Parallels but that doesn't bother me too much.

My first challenge was that I couldn't see the task bar in Coherence mode (mixed Mac and Windows). A quick Google search found other people with the same problem. Eventually I discovered that there is a menu option to show it (Applications > Show Windows Task Bar). I wonder if it wouldn't be better to show it by default and let the people who want to hide it go hunting for the menu option.

My network icon in the taskbar shows a yellow exclamation mark warning, with a tooltip of "No internet access", but I seem to be able to access the internet fine through Internet Explorer and Windows Update worked fine. When I run the troubleshooter it tells me everything is fine and asks what my problem is. I guess I just ignore the warning. Hopefully Parallels will fix this at some point.

Windows 7 has dropped the Quick Launch bar. Instead you can "pin" things to the task bar. Except that I couldn't. I tried dragging and using the right click context menu. No errors or anything, it just didn't work. More Google searching showed other people with the same problem but no clear solution. Someone suggested setting up a new user account. This didn't really make sense, but I tried it and it worked.

I guess the default initial Administrator account doesn't let you pin anything to the task bar. It would be nice if it gave you some kind of message.

If the default initial Administrator account is not a regular account, why doesn't the Windows install process create a regular account for you? Maybe because of installing via Parallels "unattended" method? I'll have to ask someone who has installed Windows 7 on a PC.

One of the few features that the Windows task bar had that the Mac OS X dock didn't was the ability to toggle between showing and hiding windows by clicking on the task bar icon. (A Windows Feature I'd Like on the Mac) Sadly, this feature seems to be gone in Windows 7. (Unless there is a way to enable it somewhere.)

I've been using Vista on Parallels, but I probably would have been better off with XP because it needs less resources. (That's why Netbooks come with XP.) I'm hoping Windows 7 will be less demanding than Vista. (It's supposed to be.)

Thursday, September 10, 2009

Apple Updates

It's been a busy couple of days for me with Apple updates.

First, Apple released Snow Leopard (the new version of OS X) ahead of schedule, while I was on holidays. The day after I got back was a holiday but I headed for London Drugs who I knew would still be open. Unfortunately, they were all sold out. The next day I tried a few more places including Neural Net (our local Apple oriented computer store). They were all sold out!

Obviously the demand for Snow Leopard was higher than expected, even though there are no really big new features.

In a way the delay turned out to have a positive side. While I was waiting I decided I might as well get the "Boxed Set" which includes the latest iLife and iWork. I'd been thinking about buying iWork anyway and my Mini had an old version of iLife, so it seemed like a good deal. Even better, Neural Net had the Boxed Set Family Pack in stock :-)

Although there were some people recommending waiting to upgrade in case of problems, most people seemed to say it was ok. I updated my MacBook first and when that went smoothly, went ahead and updated my iMac and Mini. So far I haven't had any problems, but I haven't done too much.

OS X finally includes Java 6 :-) so I wondered if there'd be any glitches with Eclipse and jSuneido, but so far so good.

Coincidentally, iTunes 9 was released yesterday so I updated that on all my machines. iTunes finally has a Wish List :-) I always wondered why they didn't have this. Was it because they wanted people to buy right away? But then why would Amazon have a wish list?

The wish list is somewhat hidden. To add items you use the pull down menu attached to the Buy button. The annoying part about this design choice is that items that only have a Rent button (certain movies) can't be added to your wish list. To actually view the wish list, the only link I could find was at the very bottom of the iTunes home page under "Manage". The help describes a different location - under Quick Links on the right hand side - which seems like a better location. It's almost as if they still aren't sure about the feature so they're making it somewhat hidden.

Another major new feature in iTunes 9 is "Home Sharing" which lets you move your media between your different home computers. This should help me keep my living room Mini's music library up to date with purchases (which I mostly make on my main iMac).

You can only use Home Sharing between computers you have "authorized" for your iTunes account. (You're allowed to authorize up to 5 computers.) Originally authorization was for DRM protected music. Since I refused to buy any DRM protected music I never had to worry about authorization. Now I do. I find I have authorized 4 out of my allowance of 5 computers. At least one of those was a machine I no longer own (my old MacBook). I don't think there's any way to un-authorize a machine after the fact (you have to remember to do it before you get rid of the machine or reinstall the OS) As far as I know, the only solution Apple offers is that you can un-authorize all your machines, allowing you to re-authorize the ones you want. (But you can only do this once a year.)

After some searching I found the setting to automatically copy music purchased on other machines. I turned it on and waited. I knew I had purchased music since I last synced my library. Nothing seemed to be happening. I made sure iTunes was running on both machines. I left it for an hour in case it was a slow background task. Nope. I'm guessing that it only works for music you purchase after turning on this option. I guess it all depends how you interpret "automatic". No big deal, it was easy enough to view my iMac library, sort by date added, shift-select all the new stuff and drag it over. I'll have to wait till I purchase some new music to see if it actually syncs automatically then.

On top of all this, Apple released iPhone OS 3.1. I installed it, but there doesn't appear to be anything too exciting in it.

The other big announcement from Apple yesterday was the new version of the iPod Nano with video camera, microphone, speaker, FM radio, and pedometer (!?). I was surprised that the camera was video only, but according to comments by Steve Jobs, this was due to size/space limitations. The FM radio even has a "pause" feature like DVR's. It was nice to see Steve back up on the stage after all his health problems.

The iPod Touch (I keep wanting to call it the iTouch) is now being targeted as a game machine. I would never have predicted that, but then again, I very rarely play games so I tend to forget what a big market it is.

Monday, August 17, 2009

iPod Shuffle Won't Shuffle

I had an older model iPod Shuffle that I used for running. It started to get flaky and eventually died totally.

So I bought a new model, smaller and sleeker, and more memory.

But ... I listen to podcasts when I'm running, not music, and the new iPod Shuffle won't shuffle podcasts.

Even if it would sort by date I could live with it. But it sorts by source, and I don't want to listen to all the podcasts from one source all in a row.

And although I like the controls on the headphone wire, you have to double click to skip tracks and it is a frustratingly slow process to skip all the podcasts from one source just to get to the next source. Good luck if you want to find a particular podcast.

I started doing some research and I read that you could skip between playlists. Ok, I'll put each podcast source in a playlist and then I can skip through them. Except you can only put music in playlists, not podcasts, despite the fact that they're all just mp3 files.

Ok, I'll just move my podcasts over into my music section so I can put them in playlists. Except you can't. For some reason, iTunes goes to great lengths to prevent this. Even if you remove the file from iTunes and then try to import into the music section, it's too "smart" and puts them back in the podcast section. There are various work-arounds but I don't want to have to do this every time I get new podcasts.

Why stop you from shuffling podcasts? Sure, not everyone will want to shuffle, but that's no different than music. After all, the one and only control on the body of this iPod is whether to shuffle or not!

Why stop you from putting podcasts into playlists? Again, I can't think of any reason for blocking this.

It's probably a similar issue as with the K7 flaw - going overboard in trying to keep people on the correct path, refusing to accept that your (the designer's) idea of the "correct" path isn't necessarily the same as your users.

Judging from all the stuff on the web about this, it obviously annoys a lot of people. Come on Apple - listen to your users!

Friday, August 14, 2009

Pentax K7 Flaw (IMO)

I just traded in my Pentax K10D camera for the new Pentax K7. Overall I'm pretty happy with the upgrade, but there's one thing that really annoys me.

Both the K10D and the K7 have a "Green" mode where everything is automatic and many settings are ignored (forced to safe, default settings).

But in the K7, Green mode now forces saving the images as JPEG - RAW is not allowed.

I shoot in RAW 100% of the time - it gives me a lot more control over the final images, and using Lightroom (or Picasa) it's just as easy to handle RAW as JPEG - there are no extra steps or things to deal with.

This means I can't use Green mode on the K7. It's not the end of the world because "Program" mode can also be used fully automatically. You just have to be remember to put settings back to "normal" after changing them. On the K10D I'd use Program to do something different, but I could just flip back to Green mode without worrying about what settings I'd changed. I'd only have to worry about it when I went to Program mode. Now, staying in Program mode, I'll have to be more careful.

I'm sure Pentax had reasons for doing this, but I think they made the wrong decision. Beginners who can't deal with RAW are going to leave their camera set to JPEG. Anyone who is advanced enough to change their settings to RAW presumably did it deliberately (like me) and doesn't want it overridden by Green mode. Besides, given the cost of this camera, the market is not beginner newbies anyway.

It's a fine line between "protecting" users from shooting themselves in the foot, and being over-protective and stopping them from doing valid things. This time I think they went over the line.

Tuesday, August 11, 2009

Tethering iPhone to MacBook

It was so nice out today, after a less than stellar summer so far, that I decided to take my laptop and go sit outside somewhere for coffee. The spot I picked (Pacific Gallery & Cafe) doesn't have wireless so it seemed like a good time to figure out how to tether my MacBook (13" unibody) to my iPhone (3Gs) for internet access.

It didn't turn out to be so easy. First you have to enable bluetooth on both devices (I hadn't brought a cable or that might have been an easier approach). Then you pair the devices. This went ok other than a little searching to find the appropriate settings.

But after pairing successfully, you're still not connected. Pulling down the bluetooth menu from the menu bar showed the iPhone but Connect to Network was grayed out (disabled). My network preferences showed a Bluetooth PAN (Personal Area Network) but it said the "cable" (!?) was disconnected. Not very helpful. In the Bluetooth preferences the tools menu (the "gear" at the bottom) had a Connect to Network that wasn't grayed out, but it also didn't seem to do anything.

If I picked the MacBook on the iPhone the Bluetooth preferences on the MacBook would switch to connected and then immediately switch back to unconnected.

Of course, I googled for the problem. A lot of the results were about how to get around AT&T not allowing tethering. But I was on Rogers (in Canada) and they supposedly do allow tethering.

Apart from the AT&T results, there seemed to be quite a few people with similar problems, but no real consensus on a solution. Some people claimed if you simultaneously hit connect on both the MacBook and the iPhone then it would work. It didn't for me. Some people suggested removing the bluetooth devices from both the MacBook and the iPhone and re-pairing. That didn't seem to help either.

Finally, one person said to restart the MacBook. That worked! I had to laugh because when people ask me about computer problems one of the first things I always suggest is to restart. But I don't expect to have to do that on Mac OS X.

The sad part is that even after I got it working it was too slow to be usable. I couldn't even bring up Gmail because it would time out. Pinging the name server was giving a response time of 4 seconds (4000 ms)! The iPhone was showing 5 bars and a 3G connection, but obviously I wasn't getting a good connection. Browsing on the iPhone was also very slow so it wasn't just the tethering.

I'll have to try it again when I've got a better 3G connection. I'm not sure if it's going to work easily in the future or not. Some of the people reporting problems had it working for a while and then it quit so I'm not totally optimistic. Maybe using a cable will be simpler. I wonder if the dedicated USB cell "modems" work better. (I would hope I'd be able to use my existing data plan?)

Friday, August 07, 2009

New Camera with Projector

Nikon | Imaging Products | COOLPIX S1000pj

I'm not sure it's something I'd use a lot, but having a projector built in to a camera is a cool feature.

Reading the fine print, the projector is only VGA resolution which is not too impressive.

If you were using the camera to record your whiteboard, it might be handy to be able to redisplay it with the projector.

Thursday, August 06, 2009

Anatomy of a feature Anatomy of a feature

Interesting description of all the little details that are behind even the simplest feature, that most non-programmers have no idea of.


iPhone Competition

The Zii Egg looks like a pretty cool gadget.
  • touch screen
  • gps
  • vga camera for video conferencing
  • hd camera
  • wifi and bluetooth
  • sd card slot
  • hd video output
  • runs open source Google Android OS
So far it's not a phone, but that's probably coming.


Saturday, July 11, 2009


Postino for iPhone - send real postcards with your photos :: AnguriaLab

One of the great things about the iPhone (and iPod Touch) is the diversity of apps. This is a cool one I just encountered. (via)

Thursday, July 09, 2009

Java Regular Expression Issue

I'm still grinding away on getting all the standard library tests to succeed on jSuneido.

I just ran into a problem because "^\s*$" doesn't match an empty string!?

Nor does "^$"

Nor does "^" (although just "$" does).

I find if I don't enable multi-line mode, then all of those match, as I'd expect.

Pattern.compile("^").matcher("").find() => true

Pattern.compile("^", MULTILINE).matcher("").find() => false

But I need multi-line mode to make it work the same as cSuneido.

I've tried to find anything in the documentation or on the web to explain this, but haven't had any luck. It doesn't make much sense to me. The documentation says:
By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.
The only thing I can think of is that it's applying "except at the end of input" even when there is no line terminator. I guess it depends whether you parse it as

(matches at the beginning of input) and (after any line terminator except at the end of input)


(matches at the beginning of input and after any line terminator) except at the end of input

To me, the first makes more sense, but it appears to be working like the second.

So far I've been able to handle the differences between Suneido regular expressions and Java regular expressions by translating and escaping the expressions. But that's tricky for this problem. I guess I could turn off multi-line mode if the string being matched doesn't have any newlines. Except I'm caching the compiled regular expressions so I'd have to cache two versions. And it also means an extra search of the string on every match. Yuck.

Of course, my other option is to port cSuneido's regular expression code, rather than using Java's. Ugh.

Backwards compatibility is really a pain!

Friday, July 03, 2009

Firefox 3.5 Early Feedback

Firefox 3.5 came out the other day and of course I immediately upgraded. I had been tempted to try the release candidates, but I depend on my browser so much these days that I didn't risk it. Ideally, I'd probably wait until the new version had been out for a while before upgrading.

For the most part, I don't really notice the difference. I don't doubt it's faster, but I haven't really noticed the difference. It seems most people don't notice when something acceptable gets faster, it's when something gets slower that you notice.

One minor annoyance is that when you only have one tab open (and you have the option turned on to still show tabs when there's only one), there is no longer a close button. Probably the thinking was that there is no point in closing the last tab. But I actually used that feature quite a lot when I wanted to keep my browser running, but I wanted to close e.g. Gmail so I wouldn't be distracted.

On the Mac it's not so bad because I can just close the browser window (on OS X this leaves the program running) and then click on the dock to open a fresh window when I need it. But on Windows, if you close the window, you exit the program.

There are workarounds available - obviously other people found this annoying too.

It's good for me to run into this problem from the user perspective. I tend to ignore our customers when they complain about minor "improvements" I've made. I have to try to remember how annoying it can be when you're used to working a certain way and it's taken away from you for no apparent reason.

I shouldn't have been surprised, but I was a little shocked when I ran 3.5 for the first time and it told me about all the add-ons that weren't supported. Luckily, none of them were show-stoppers for me or I would have had to figure out how to go back to the previous version.

It would be nice if the installer could tell you which add-ons were incompatible before it started the install process so you could cancel if necessary. Otherwise it would be a painful process to go through each of your add-ons and try to find out if it runs on the new version.

I guess another option would be to use portable version of Firefox to test add-ons. But even then, you'd be faced with installing them one at a time since there's no way to sync add-ons yet (that I'm aware of). Maybe I need to look at something like Add-on Collections.

One of the add-ons I was surprised wasn't supported on 3.5 yet was Google Gears. Which means I've lost the off-line support in Google mail, calendar, docs, reader, etc. I assume they're working on it.

I've also switched back to Weave to sync Firefox between my multiple computers. I used it for a while before, but switched to Foxmarks because it seemed better. But Foxmarks has turned into Xmarks and doesn't seem to be focusing on synchronization. And Weave has improved a lot. (I originally used Google Browser Sync but that was discontinued.)

One annoyance with these kind of sync tools is that the obvious time to check for changes is when you start the browser. But if you have a master password, then every time you start the browser it asks for your password, which is annoying and also not very good for security.

Thursday, July 02, 2009

Gmail Labels

Official Gmail Blog: Labels: drag and drop, hiding, and more

Finally Gmail is improving the label facility. I like the idea of tagging my emails, but previously it was quite awkward when you got too many labels. There was no way to hide labels for old projects or to make commonly used labels more accessible.

I'm guessing they imagined people would have a handful of labels, similar to the handful of built-in ones. But look at any tagging system, like Delicious or Flickr, and you'll see large numbers of different tags, not just a few.

There were workarounds like renaming labels to move them up or down the alphabetical list. Or addons like Gmail folders (which tended to break when Gmail made changes).

The drag and drop is nice, but to me the big improvement will be the ability to hide old labels and to normally only show the frequently used ones.

Google Update

Google Open Source Blog: Google Update, regularly scheduled

It has always seemed ridiculous that so many programs run their updater as a background process, even though they only have to run periodically (e.g. once a day or week). I realize they probably don't use a lot of cpu or memory and they're probably swapped out most of the time, but if nothing else starting them all slows down the boot process.

As a programmer, I can understand the desire to keep control, but these are updaters, not critical operations. The software itself can always check for updates as well.

It's nice to see at least Google switching to running their updater as a scheduled task.

Wednesday, July 01, 2009

3D Video Stabilization

Content-Preserving Warps for 3D Video Stabilization
via John Nack

I thought the image stabilization in iMovie '09 was cool, but it looks crude next to this stuff.

I wonder how long it'll be till we see this technology in video editing software, or even in cameras themselves.

Too bad you can't do this kind of software image stabilization for still images. But recovering from a fuzzy image is a lot tougher problem. Maybe if the camera shot multiple images (almost like a brief video) then you'd have enough information.

Tuesday, June 30, 2009

A Windows Feature I'd Like on the Mac

Both Windows and Mac OS X let you "minimize" windows to the task bar / dock.

Both let you bring a window back by clicking on the task bar / dock.

But on Windows you can click on the task bar icon a second time to minimize the window again. I've got in the habit of using this to take a quick look at a window and then hide it again. I keep trying to do that on the Mac but it doesn't work.

I can see one argument against this feature would be that people often get confused and double-click instead of single-clicking. If implemented naively, a double-click would show and then hide the window immediately, frustrating the user. But Windows solves this problem by treated a double-click the same as a single click.

If anyone knows a way to make this work on the Mac, leave me a comment and I'll owe you one.

One part of this that is nicer on the Mac is that "Hide" minimizes all of an application's windows, and clicking on the dock brings them all back, whereas on Windows it's one window at a time. I have a vague memory that Windows 7 might improve this.

Thursday, June 25, 2009

The iPhone Software Revolution

Coding Horror: The iPhone Software Revolution

Someone else who finally took the plunge and bought an iPhone.

And this rave review is from someone who isn't an Apple or Mac fan.

Monday, June 22, 2009

Apple Sells Over One Million iPhone 3GS Models

Apple Sells Over One Million iPhone 3GS Models in the first three days.

And one of those was me - I finally broke down and bought an iPhone, surprising some people, because although I love gadgets, I don't like cell phones. I could have bought an iPod Touch, but although I didn't really care about the phone, I wanted all the other features like 3G, GPS, compass, camera, etc. that don't come with the Touch. And I'm sure I'll end up using the phone occasionally now I have it.

I'm already loading up on iPhone apps. With over 50,000 available they're one of the best parts of the iPhone/Touch.

Friday, June 19, 2009

Ultra High Speed Photography - Kameras

1 million frames per second - amazing!

(I couldn't get the videos to play in Firefox but Internet Explorer worked.)

Wednesday, June 17, 2009

Mac OS X Hangs from Lightroom

More often than I'd like lately, when I import photos into Lightroom (from an SD card in a USB reader) it hangs my whole Mac.

I can understand how Lightroom could crash, but I'm a little baffled that it manages to freeze the whole operating system. You get the spinning beachball and you can't to anything - can't switch apps, can't pull down menus, can't do Ctrl + Eject to shutdown.

At first I thought it was because I would start to view photos while it was still downloading, so I quit doing that, but it's still happening.

The strange thing is that Lightroom is normally very stable. It doesn't crash or hang when I'm working in it, no matter what I do. I suspect this is more of an OS bug, or at least a bad interaction between the app and the OS.

This seems to have become a problem recently, perhaps related to either Lightroom updates, or OS X updates, or both. (That's one of the downsides of all these automatic updates.)

I wonder whether it has someting to do with importing directly from the SD card through USB. Not that that is an excuse for the OS to die, but I could see where there would be some low level device stuff going on. Maybe I should copy the files to the Mac and then import from there. Although that's quite a bit more hassle since Lightroom auto-detects memory cards and goes straight to Import. However, I think you can set up Lightroom to "watch" a directory, so maybe I could do that and copy to that directory.

Friday, June 12, 2009

Continuous Obsolescence

I see my 13" MacBook has already been replaced by a new model.

It's been moved to the "Pro" label, gained its Firewire connector back, and now has an SD slot (which I'd like for downloading photos).

Apple seems to be coming out with new models faster than ever. The model I have was only out for 7 months before being replaced! Most software doesn't get upgraded that quickly, let alone hardware.

I like the rapid improvement, but I hate the resulting feeling of being left behind! Too bad we can't get automatic updates like software :-)

Edward Bear and Software

I just started reading Java Power Tools and the opening quote on the preface was this:

Here is Edward Bear coming downstairs now, bump, bump, bump, on the back of his head, behind Christopher Robin. It is, as far as he knows, the only way of coming downstairs, but sometimes he feels that there really is another way, if only he could stop bumping for a moment and think of it.

-- "We are introduced to Winnie-the-Pooh and some bees, and the stories begin,"
Winnie the Pooh, A. A. Milne
What a great quote for software development!

Sunday, June 07, 2009

Too Good to be True

I should have known that it was too good to be true that all the tests up to the N's were succeeding. I was a little suspicious, but who likes to question positive results.

What I wasn't remembering was that TestRunner reports errors at the end, not after each test. And I was never getting to the end because I'd get an unhandled exception (i.e. crash).

When I'd hit an unhandled exception I'd run that individual test by itself so once I fixed the exception I'd see the errors caught by TestRunner and I'd fix those, generally by implementing missing methods.

So the tests weren't succeeding up to the N's, they just weren't crashing. When I realized this, and specified that TestRunner should stop on the first failing test, I didn't even make it past the A's :-(

Oh well, there was nothing wrong with fixing the crashes first. I'm just nowhere near as far along in the process as I over-optimistically thought.

Back to slogging :-)