Thursday, May 06, 2010

Perseverance

Another day working on my bug but at least I've made some progress, at least as far as narrowing it down.

I managed to simplify my test to just repeating the same request over and over (instead of the semi-realistic mix of operations in the original test) This greatly reduces the amount of code where the bug could be.

Next I wrote a test client in Java to do that same request. But it succeeded. (Of course, you never know if one more run would run into the bug.)

Doing more testing with the cSuneido client I realized that the bug occurred more often (or only) when the client and the server were on separate machines. So I tried running my Java client on a separate machine. And now it failed too.

So now I know the problem isn't with cSuneido, or Parallels, or Windows.

I narrowed it down even further by commenting out all the real work of the request (the database querying stuff) leaving only the framework of request handling. It still fails. This reduces the amount of code where the bug could be even more.

But I'm still puzzled. My simple test client plus test server has yet to fail, even running on separate machines. But the test client plus the real server does fail, even though the real server is cut down to where it's doing little more than the test server. There must be some difference. Or else it's just that the probability of the error is less and I haven't run it enough times.

Just to give an idea, when it's failing, it fails about once every million requests. Gotta love that kind of bug!

So I still haven't found the problem, but I might be getting closer.

No comments: