Tuesday, February 19, 2008

Suneido Build Frustrations

I made a minor improvement to the Suneido source code this morning, ran make, no problems, built-in tests ran successfully.

But when I tried to use the new executable I got an obscure database error. What's going on?

I had built with MinGW so I switched to Visual C++. Exact same error!?

This was at home and my last builds had been at work but I can't see why that would matter.

Remove all the object files and build from scratch. No good, same problem.

Check version control to see what I'd changed lately - very little, and nothing that seemed related to the error.

The error is from the database btree code. Maybe the database is corrupted. But all the exe's, old and new, say the database is fine.

Try creating a new database with just the standard library. Now I get a different error related to the Scintilla source code editing component.

Build a MinGW debug version and run it under GDB. That gives me a clue to what query is leading to the error. It had appeared to be outputting to the database, which seemed odd for start up, but it was actually building a temporary index for a query that it was reading. Although that query in the old working executable doesn't require a temporary index.

Turn on the query tracing at the start of the standard library Init to see where the query is coming from. It's loading the plugins.

Aha, that's why the database with only stdlib gets a different error - because it only has to look for plugins in a single library and therefore no temporary index. Yeah, if I disable the plugin loading then I get the other error.

Two unresolved questions
- why the temporary index in the new builds but not the older build?
- why the later UI error?

And how are these two questions related? (assuming a single cause) It seems like it would have to be something low level, like the garbage collector, to affect such unrelated areas.

Of course, it could be something like an uninitialized variable that happens to get a different value on my home system. But it seems too consistent for that. And something like would likely have been encountered before now.

What is different between my work and home machines? I call the office and have them install LogMeIn on my work computer so I can access it. I try building with MinGW and it works fine. The exe is a different size though. Something is different.

Transfer the home exe to work to see if it's the environment. Same error message, so at least it's not because of my Vista on Parallels on Mac setup at home.

md5sum the files at work and at home and compare. The only real difference is the change I made this morning.

But ... that couldn't be the problem could it? Revert that file and build.

Oh no, this is really embarrassing! It works. The problem was the most obvious first place I should have looked - the change I just made.

I'm really tempted not to post this - it just makes me look stupid.

Why did I go off on a wild goose chase? I guess because the error seemed to be so totally unrelated to my change, and I hadn't built for a while so it seemed likely that there could be a problem. And the change I did seemed trivial so I didn't suspect it. And because it seemed trivial I didn't write any tests. (The bug was also obvious, once I looked for it.)

Ouch. There goes a few hours down the drain. Maybe I learned a lesson, but sadly it's one I should have learned a long time ago.

No comments: