The Software Life: March 2010

Thursday, March 18, 2010

Eclipse Keyboard Shortcuts

I don't tend to use a lot of keyboard shortcuts, but I use these Eclipse ones quite regularly:

F3 - Open Declaration

Shift + Command + G - References

Alt + Command + M - Refactor - Extract Method

Alt + Command + R - Refactor - Rename

Command + F11 - Run

Even if you're only an occasional Eclipse user, it's probably worth learning a few of the shortcuts.

Related posts:

Template Shortcuts in Eclipse

Eclipse Tip for Windows and Mac Users

Thursday, March 04, 2010

One of the features of the Eclipse IDE is that you can define "template" shortcuts. For example, you can expand "main" into the complete method definition. I don't use a lot of these, but they are handy once your fingers learn them.

Eclipse comes with a bunch of pre-defined templates and you can add your own. For example, I added a template that expands "sop" into "System.out.println".

Ctrl+Space brings up the choices. In theory if you select Auto Insert and there is only one matching choice it will automatically fill in the template. In practice there are usually multiple choices so you have to pick from the list. That's usually just a matter of hitting Enter (as long as your choice is the first one).

You can find the templates in Preferences - Java - Editor - Templates.

I also edited the "test" template to generate a JUnit 4 test (instead of the older style):

@Test
public void ${name}() {
${cursor}
}

Tuesday, March 02, 2010

A Faster, Cleaner Database Design for Suneido

Current design:

entire database is kept in a single file
the database file is accessed via memory mapping
data records are immutable, updates add new versions
indexes are mutable
data records are used like a "log" for crash recovery
indexes are not recovered, they are rebuilt

For concurrency, it would be really nice if the database was a persistent immutable data structure. But this is difficult. For example, you don't want to write a new copy of a btree index node to disk every time you update it. (At least, I'm assuming that the resulting database growth would be impractical.)

In theory Suneido happily accesses databases bigger than physical memory. In practice, performance drops off rapidly when this happens due to swapping (page faults). You would think there would be a "working set" of data that would be a small portion of the database. But users tend to run things like historical reports or searches that access large portions of the data.

The other issue is that memory mapping the database makes reads fast (once the file is mapped into memory). But writes still have to go to disk (albeit somewhat asynchronously). And the index update writes are essentially random which is the worst case for disk performance. (Maybe SSD drives would help with this?)

Which led me to think, why not just keep the database in memory. But...

Despite the performance issues, I'm not sure it's reasonable to limit database size to physical memory.
Loading the database into memory on startup and saving it to disk on shutdown would be slow.
You still need to write something to disk for crash recovery.

Next iteration, why not memory map the database, write data records to disk (so they can still be used as a "log" for recovery), but keep index changes in memory and only write them out when you close the database.

This has some nice features:

Similar start up speed (although slower shutdown)
Less disk writes (only data, not indexes) - faster
Disk writes are sequential (index updates were random) - much faster
Allows databases bigger than memory

Suneido's crash recovery rebuilds the indexes anyway, so it's not a problem that they're kept in memory. Theoretically, if indexes are now immutable you could take the last good state and roll forward from there for faster recovery.

You could also periodically update the on-disk indexes, presumably during idle time on a background thread.

This seems pretty good, but there's more. Now that index updates are not being written to disk immediately, it becomes feasible to make the indexes persistent immutable data structures.

jSuneido already sort of does this, but in a complicated round-about way. Each transaction maintains an "overlay" of updates to indexes. When the transaction commits, these are merged into the real indexes on disk and other concurrent transactions are given a copy of the old index nodes so they don't see the changes. It works, but it's ugly and error prone.

With this new scheme, each transaction has, in effect, it's own copy of the indexes. Since it's a persistent immutable data structure, updates by transactions have no effect on each other. When a transaction commits, you merge it's indexes with the master. I think this would make the code quite a bit simpler.

It's hard to say how much faster this would be. Updates would be considerably faster, but reading would be much the same. So the overall speedup would depend on the mix of updates and reads. However, in a multi-user scenario (the current target for jSuneido) if you can reduce the load, that benefits everyone, including reads.

I'm not going to drop everything and implement this, but it's an interesting idea. I'll have to mull it over some more - there may be flaws that I haven't thought of yet. It might be something to work on once jSuneido is "finished" and deployed.

As with any "bright idea" like this, I can't help wonder if I'm reinventing the wheel. I wouldn't be surprised if someone was already using this approach. Of course, that doesn't mean that there's any way to reuse their code, or even learn from them. The internet might give us access to the world's knowledge, but it's still hard to find this kind of stuff.

Related posts:
Language Connoisseur
A Java Immutable Persistent Map
A Java Immutable Persistent List
jSuneido Progress