The bad news is that after I'd done all this, I was still getting the original error! It only occurs about once every 200,000 transactions (with 2 threads). (Thank goodness for fast computers - 200,000 transaction only takes about 5 seconds.) Frustratingly, it doesn't happen in the debugger. With this kind of problem it's not much use adding print statements because you get way too much irrelevant output. A technique I've been finding useful is to have each transaction keep a log of what it's doing. Then when I get the error I can print the log from the offending transaction. It's not perfect because with concurrency problems you really need to see what the other thread was doing, but it's better than nothing.
It was also annoying because it was the end of the day so I had to leave it with a known error :-(
Thinking about it, I realized I had rushed coding some of the changes, hadn't really reviewed them, and hadn't written any tests. Not good. When I went back to it this morning, sure enough I had made mistakes in my rush job. Obviously, that self imposed pressure to get things resolved by the end of the day is not always a good thing.
So now I'll go back and review the code and write some tests before I worry about whether I've fixed the original problem.
1. A famous aphorism of David Wheeler goes: All problems in computer science can be solved by another level of indirection;. Kevlin Henney's corollary to this is, "...except for the problem of too many layers of indirection." - from Wikipedia
No comments:
Post a Comment