Thursday, February 18, 2010
The checking part went quite smoothly and quickly. The rebuilding/recovery part has been slow. I've been working on it for several days and it seems like I've made no progress. The only code I've written is tests and debugging! But whether it feels like it or not, I guess I was making progress because it "suddenly" started working today.
Recovery is hard for a number of reasons. For starters you're assuming the database has been corrupted, so you have to code a lot more defensively. Second, because you're working outside the normal operation of the database, you can't use the normal functionality. You're not doing transactions etc.
One lesson I (re)learned was how important "visibility" is. By that I mean being able to "see" the data you're working on. That's a large part of why debuggers can be valuable - they let you inspect the data. Often, inserting the right "print" statement in the "right" place is all it takes to figure out a bug. Of course, finding that right place is not so easy. In this case visibility meant writing a dump utility so I could see exactly what was inside the database. Obviously, I have a pretty good idea in general how it's structured, but not the details of exactly which types of blocks of data are in which sequence. And dumping to text files meant I could use tools like diff to compare before and after recovery.
PS. In hindsight, yes, it would have made sense to write the database checking much earlier so I could verify that databases weren't getting corrupted by operations. On the other hand, that was usually pretty obvious - it crashed!