Saturday, March 03, 2007

Slow Code

One of the problems we run into with our code is that it ends up too slow.

Before I continue I want to assure you that I'm not suggesting "premature optimization". I fully agree that optimizing before it's necessary is the wrong thing to do.

What I am suggesting is that "too slow" is a bug, and like other bugs it should be avoided if possible, or at least caught as soon as possible.

One of the causes of slowness is code that is ON2 (or worse!). Nested loops are a common cause. Programmers either don't recognize the nested loops (hidden in separate functions or in recursion), or else they don't realize the dangers of them. For example, if you have 10 items and you iterate over them twice that's 20 loops. If you have one loop inside another that's 100 loops. 5 times as many loops but not a big issue. But what if you have 1000 items? Iterating twice is 2000 loops, but nested it's 1,000,000 loops or 500 times as many. If each loop takes .001 seconds then 2000 loops is 2 seconds, but 1,000,000 loops is 1000 seconds or roughly 17 minutes. In an interactive application that's a big problem.

I think one of the reasons for this problem is that programmers unconsciously relate speed to number of lines of code. So:
for each item
for each item
do stuff
can look shorter than:
for each item
do stuff
for each item
do stuff
Higher level languages, libraries, toolkits, frameworks etc. can make this even worse. Now one line of code can do a huge amount of work, but to the programmer it's just one little line of code. How bad can it be?

Programmers normally test with small amounts of data. When you're only dealing with a few items, both 2N and N2 are fast. You don't get the feedback about the slowness until much later, when the user has larger amounts of data. And programmers are inclined to ignore whining from users anyway!

Another related issue is the difference between dealing with things in memory and dealing with things in the database. The difference is huge, but again, in testing it may be barely noticeable. And, again, the actual number of lines of code may be comparable. (This can be a problem with tests. For one test it doesn't matter. For a whole test suite it can mean the difference between a suite that runs in a minute and one where you have to run it overnight.)

Unnecessarily reading "everything" from the database into memory is another common mistake. If you actually need all the data in memory at once, fine, but if you just need to find a particular record or set of records then it's an inefficient way to do it. In our Ruby on Rails project I regularly come across code that does model.Find(:all). At first glance it might seem harmless. But what happens when that table contains 100,000 records? (Sadly, part of the reason for this is that Rails doesn't seem to provide any way to iterate through a query, so the alternative to Find(:all) is to do your own SQL - an ugly choice.)

I don't have a solution to this problem, but I think we need to become more aware of it. I think it's one of the reasons why our software gets slower at the same time as our computers get faster!

1 comment:

Anonymous said...

Is the SQL issue w/Rails an issue w/Ruby's handling of databases?

I like Perl the best when it comes to database interfacing (via DBI). PHP, Python, and Ruby seem to be lacking in scope (database systems that can be accessed) and cohesiveness (one interface via DBI) that Perl offers when it comes to database access. Of course, you are talking Framework, and I'm only pointing out one aspect of the situation.

Loops:

Back in the Turbo C/Turbo Assembler days, I would have Turbo C dump asm code for the looping portions and "hand tune" those sections. The glory days! Ha Ha...