Friday, January 04, 2013

The Evolution of a Feature

The application software that we sell has gotten bigger over the years. We now have roughly 750,000 lines of (Suneido) code. That's peanuts to someone like Microsoft, but it's a lot for a handful of programmers in a small company.

Mostly I work on the Suneido implementation itself i.e. the C++ and Java code. But recently, for a change, I've been doing some programming in Suneido. (My staff doesn't always appreciate this because I tend to be a little more "aggressive" than they would be, and as a result I've been know to break a few eggs in the process of making my omelettes.) I quite enjoy the chances I get to actually use the language I've designed and implemented. And unlike C++ and Java, if there's something I don't like, I have the option of fixing it!

As our code has grown navigating through it has become more of an issue. I might vaguely remember the name I'm looking for, but I have a hard time remembering if it's FooBar or Foo_Bar, or Foobar. We have naming conventions, but there's still variation.

A lot of the time you can follow references through the code using "go to definition", but sometimes you need to start from scratch.

We use a "Find" function in the library view, basically a "grep". That works, but it's not ideal. It's too awkward to write regular expressions like "Foo_?Bar" every time.

Several years ago I improved the Find function to search the names by converting both the actual names and the search to a canonical form, e.g. converting to lower case and stripping out underscores. It then "scored" the matches.

This was an improvement, but it didn't show you a list of matches, it just let you step through the matches. There were still lots of times when the first match wasn't what you wanted.

About the same time, I added auto-completion in the code editor. So you can start typing a name and it would display a list of matches. At least this way you could see a list and pick the one you wanted.

I noticed programmers go to the work space, start typing a name, use auto-completion to get the right one, and then go to the definition. This worked, but it's a little round-about!

My recent next step was to change the name search field in the "Find" dialog to do auto-completion. This avoided having to go the workspace route.

But because of the way Scintilla (the code editor component we use) works, auto-completion can't ignore underscores or do other kinds of matching.

Also, it seemed a little ugly to auto-complete the exact name, and then go searching for it, even though no searching is necessary in this case.

So my most recent improvement is to add a "locate" field (at the top-right of the library view). And instead of using the Scintilla auto-completion, I used a different auto-complete control we had which allows more flexible matching. (In the process finding and fixing a bunch of bugs in this control.)

I also got the idea to add what Eclipse calls "camel case searching", being able to search using just the capitalized letters in a camel case name. e.g. using "oic" to search for OpenImageControl. This is quite handy when names get long.

The simplest way to implement this would be to just perform a linear search of all the names. But we might have roughly 20,000 active names, and doing a complex search linearly is fairly slow. (Although perhaps usable.)

The auto-completion uses a sorted list of names and does a binary search in it. This is fast, partly because Suneido has a built-in lower bound binary search.

But how do you implement something like camel case searching with a binary search?

I realized that I could do it by putting multiple entries in the list. e.g. for OpenImageControl there would be two entries in the list, one for "openimagecontrol" and one for "oic". I also needed the original name so the entries in the list look like: "oic=OpenImageControl".

Suneido also lets you overload the same name in multiple libraries (a much abused feature!). So I added the library to the entries e.g. "oic=OpenImageControl:stdlib".

But when I sorted those entries, the library part sorted alphabetically, which isn't a useful order. I wanted to sort the library part in the same order they were being used. So I changed the entries to use a library index instead of name e.g. "oic=OpenImageControl:01"

The auto-complete does a binary search to find the matches, removes the part up to the equals sign, sorts and uniques it, then converts the library back to the name, resulting in e.g. "OpenImageControl (stdlib)"

The other question is when to build and update the list. It takes a noticeable amount of time to build the list (about 1/2 a second on a fast machine) so I didn't want to do too often. And yet I wanted the list to be up to date. One option is to update it when the GUI is "idle". But most of the time it wouldn't need updating. Ideally, I wanted changes to "invalidated" the list, and only update it when it was next required. Unfortunately, there's no easy way to be notified of library changes. I ended up using two things to determine if the cached list was still valid. First, if the libraries in use changed, then the list had to be updated. Second, if the maximum "num" id field in any of the libraries changed, that meant a record had been added and the list needed to be updated. But checking the num field meant database queries so I didn't want to do that too often so I throttled it to check at most every 15 seconds. That still leaves a slight lag occasionally when you use the locate feature (if an update is needed) but it's rare enough (and brief enough) not be be annoying.

I only just finished this feature so it's hard to say how it will work out, but so far it seems like a good (or at least better) solution.

No comments: