Thursday, April 29, 2010

Dig Deep

In the last few days I've questioned a number of changes that I saw going through version control. None of them was "incorrect". The recurring theme was that they were fixing problems at too "shallow" a level.

For instance a dialog was using a feature to save its size after being changed by the user. This wasn't working properly, so the feature was disabled. There's a couple of problems with this. One is that we lose a feature from that dialog that we presumably wanted. But more importantly, that feature is used in many places and if there's a bug in it, it's going to show up elsewhere. Are we going to disable it everywhere? How much time are we going to spend dealing with each new instance that turns up?

Of course, sometimes you have to fix a bug the most immediate way. That's fine as long as you go back and look for the real problem.

Here's another example: we got an error about a missing database table. This was after wiping out all the data so the initial thought might be to ignore it. But this table is supposed to be created automatically. So we dug a bit deeper and we found one place where it didn't create it when necessary and fixed that. But access to this table is supposed to be encapsulated in one class. This class handled creating the table if necessary. The spot we fixed was accessing the table directly. So we moved this access into the class to restore encapsulation. But then we realized that it shouldn't even need to use this class/table. It turns out we had used it to work around yet another bug. So we removed the use of the class/table and fixed the original, previous bug. The end result was less code and simpler.

This process of digging deeper into a problem can take time. It can take orders of magnitude more time than a quick fix to the immediate problem. Which is why we often apply the quick fix and move on. But the long term result of that approach is technical debt - your code gets more complex, less understandable, less consistent, hard to change. And in the long run the quick fix ends up costing your more time than if you had fixed it properly in the first case.

This is a bit like the "lean" idea of finding and fixing "root causes". One of the techniques is to ask five (more or less) "whys". Why did we get a missing table error? Because the code wasn't creating it on demand. Why? Because it was bypassing the access class. Why? Because it was doing something non-standard. Why? To work around another bug.

1 comment:

Jen said...

5 "whys"! Whatever happened to "because I said so"? :o)