Tuesday, December 12, 2006

Too Many Bugs!

Recently, I realized that we had too many outstanding bugs in our main application. And when I started pushing to work on this, I found there were a lot more than I had realized.

In our defense, many of these bugs are obscure and/or one-time occurrences that we can't recreate. Or they're things that are only bugs in the sense that they don't follow our standard ways of doing things. But thinking that it's ok to have bugs is a slippery slope - soon you're ignoring bigger and bigger problems. It's the same reason why it's important to have your automated unit tests always 100% succeeding. (which we do)

In theory, our "rule" has always been to fix bugs before writing new code. But there's also a lot of pressure to get new features written. And it's easy to let that pressure override the bugs first rule, especially when the bugs don't seem that "critical".

For now, we've switched our priorities to cleaning up the outstanding bugs. That means new feature development has slowed dramatically. We're still trying to do urgent changes for customers but that's about it.

In theory, fixing bugs first should be self-adjusting. If you're spending all your time fixing bugs, you're not writing new code and therefore not creating new bugs. Of course, that assumes that when you fix bugs you aren't (or at least aren't always) introducing new bugs.

But even if it does self-adjust, it's deeply unsatisfying to be spending such a large percentage of our development time fixing bugs. The real question is how to reduce the number of bugs. A common reaction is "we'll have to be more careful". But people will always make mistakes. All you can do is have a process that catches the mistakes. Of course, there are things you can do to prevent some mistakes - automate manual processes, modularize your code to reduce unexpected interactions, etc.

In general, the sooner you catch a mistake the better. That's one of the reasons for pair programming - so the second person can catch mistakes right away. It's also one of the reasons for writing tests alongside code. A mistake that's caught sooner is usually easier to fix. It also provides more meaningful feedback. A mistake someone finds months after the code was written is unlikely to provide feedback that will improve your coding. (This is also one of the reasons we're moving towards continuous deployment - daily releases instead of quarterly.)

We already pair program and write tests (although we could do better on the tests). We've also started having a third programmer (in addition to the original pair) review and manually test changes, the same day the work is done.

One of the best ways to reduce bugs is to write good code. On the scale of lines of code and individual methods I think our code is pretty good. But our application is getting bigger and more complex and one of the major "causes" of bugs is complexity. Where I think we've fallen down is in the larger scale organization and architecture. When a program is small it doesn't need an elaborate architecture. (Arguably shouldn't have an elaborate architecture.) But if you just keep adding to an application, eventually it reaches the size where it does need better large scale organization. But in the routine of day to day programming, how do you recognize/decide when to work on this?

Extreme Programming recommends "incremental design".
Invest in the design of the system every day. Strive to make the design of the system an excellent fit for the needs of the system that day. When your understanding of the best possible design leaps forward, work gradually bug persistently to bring the design back into alignment with your understanding.
-- Kent Beck, Extreme Programming Explained 2nd Ed.
Unfortunately, easier said than done. A person can only think of so many things at once. If you're fixing a particular bug or adding a specific feature, it's hard, if not impossible, to also be thinking about the overall design. You might refactor on a small scale, but you're unlikely to dream up a new organization for the whole application. And even if you did, you'd still have to communicate it to the whole team and somehow get everyone to work towards it. Maybe some teams can manage this, but it wouldn't be easy.

The best "solution" I've come up with so far is to enhance our framework so it provides some larger scale organization, and then to gradually migrate the code into the new facilities of the framework. Some of the things we need to do are things we should have known to do all along - isolating database access, keeping application code out of the user interface, separating data retrieval from formatting in reports, etc.

Disclaimer: None of this is especially new or original. It's more in the nature of thinking out loud, trying to figure out how to apply known ideas to our situation.