Thursday, May 08, 2008

Suneido Broken & Fixed

When I was making the changes to handle thread specific storage on ACE I encountered some "ugly" code.

The first thing was a lot of instances of:
which got even worse when they became:
(and similar for push, pop, etc.)

In interp.cpp I had defined inline functions for these, but I hadn't extended this to elsewhere. I moved them to interp.h and converted all the instances to:
While doing this I noticed code like:
Value* sp = GETSP();
This seemed like a perfect application for an automatic destructor. So I added this to interp.h:
struct KeepSp
{ sp = GETSP(); }
{ SETSP(sp); }
Value* sp;
#define KEEPSP KeepSp keep_sp;
Using this, the code became simply:
This was even better where I'd had:
Value* sp = GETSP();
Value result = ...
return result;
because it could now be simplified to:
return ...
The next thing that caught my eye was code like:
throw ;
Another good application of automatic destructors. So I added this to std.h
template<class T> struct Closer
Closer(T t) : x(t)
{ }
{ x->close(); }
T x;
This let me rewrite the code to simply:
Closer<mytype> closer(x);
(without any try-catch)

I was feeling pretty good about this refactoring, until I found that one of my tests was broken. I reviewed all my changes to see if I'd made any inadvertant typos or other mistakes. It all looked good. I started tediously reversing my changes one at a time until I found where I'd broken the test. (Theoretically I should have compiled and run the tests after each little change, but unfortunately that takes a little too long to be tolerable.)

I found the spot, but it looked fine. It almost seemed like the KeepSp destructor wasn't getting run. At one point it seemed like it worked on Visual C++ so I started looking for bugs related to destructors with GCC 3.4.2 But I'd been mistaken, it was actually broken on Visual C++ as well.

I tried using GDB but it didn't really help. In the end I added some "prints" and that gave me the clue that there was an exception being thrown, but the problem wasn't that the destructor wasn't being run - the problem was that the destructor was being run. The old way the code had been written, the stack pointer wasn't being restored if there was an exception, now it was.

Why would that cause a problem? Looking at the interp.cpp code I saw a key comment - returns from a block (that actually return from the containing function) are implemented as exceptions and pass the return value on the stack. My new code that restored the stack pointer even when there was an exception was losing the block return value.

Hmmm... now what? Do I put the code back to the way it was? But it seems kind of ugly to depend on cleanup code not getting run when there's an exception. Instead I changed block returns to store the value in the proc rather than on the stack. Now all the tests pass.

Thank goodness for tests. Only one test (out of hundreds) failed. Without that test it could have been quite a while till I ran into this bug, and it would have been a lot harder to track down.

You could argue that if I'd just left well enough alone I wouldn't have run into the problem in the first place. After all, if it's not broken, don't fix it. I don't agree. Down that path lies chaos. You have to fight entropy. Otherwise each successive change makes the code slightly worse, until eventually it's too incomprehensible and fragile to dare touch.

The day was a bit of a roller coaster. I start off feeling pretty good about the cleanups. Then I got frustrated when I couldn't figure out why it was broken. Then when I thought I'd have to undo my changes I got depressed. I know, rationally it's pretty silly to get depressed about something like that. But I guess if I didn't care enough for it to be depressing, then I would never have gotten this far in the first place. Still, you'd think after 30 plus years of programming I'd be a little less emotional about it!

At this point I decided to take a break and go for a run. That usually improves my mood. And while I was running I figured out that I could make it work with the improvements. So between the endorphins and not having to undo my changes, suddenly I was in a good mood again. That's probably a good place to leave things for the day!

No comments: