gSuneido has had a long standing assertion failure that happened maybe once a month which equates to about once per million user hours. We tried multiple times but were never able to recreate it.
I’ll try to explain the scenario. Indexes are stored in the database file as btrees. While the database is running, index updates are buffered. An Overlay consists of the base btree plus layers of updates. When a transaction commits it adds an ixbuf layer to the Overlay’s that it updated. Background threads merge the layers and update the btree. OverIter handles iterating over an Overlay. It merges iterators for each layer. This is fundamentally straightforward but as always, the devil is in the details. The ixbuf layers from the transactions include updates and deletes as well as additions. So the merging has to combine these. For example, combining an add followed by several updates. The final piece of the puzzle is that concurrent modifications can occur during iteration.
"OverIter Cur deleted" means that the current value of the iterator is marked as deleted. This should never happen. The iterator is designed to skip deleted records.
The error occurred again recently and I decided to see what Claude (Sonnet 4) could do with it. It kept saying "now I see the problem", but when I'd tell it to write a test to make it happen it couldn't. It became obvious it wasn't going to spot the bug by just looking at the code so I got it writing tests. It wrote a lot of them, and they all passed. That was actually kind of nice since it meant the code was fairly robust. I wouldn't have been surprised if other bugs had showed up with the intense testing.
Finally, it wrote a random stress test that caused the assertion failure. I was cautiously optimistic. Sadly, it turned out it was generating invalid test data and that was what was triggering the assertion failure. Once I corrected the data generation, then the test passed. It was possible that the bug was actually leading to bad data which was leading to the assertion failure but that seemed unlikely since bad data would cause other problems.
Back to the drawing board. I continued extending the test to cover more scenarios. Eventually I managed to recreate the error with legitimate data and actions. After that it was a matter of extracting one failing case from the random test. Claude added print statements, looked at the result, and wrote a test for that specific sequence.
Once I had a relatively simple failing test, Claude claimed to find the bug right away. I was skeptical since it had claimed to find the bug many times already. I worked on extending the test to cover more of the scenarios. Sure enough, it started failing again. Claude came up with another fix. But the test kept failing. The proposed changes would fix certain cases but break other cases. No combination of the “fixes” solved all the problems.
Eventually Claude proposed rewriting one of the key functions. I was even more skeptical but the code looked reasonable and it was simpler and clearer than the old code. It wasn't very efficient but it wasn't hard to tell Claude how to optimize it. But it still didn’t fix all the problems. I dug into the other “fix” and realized it was on the right track but wasn’t quite complete. A little back and forth came up with a solution here as well. And finally I had a version of the code that passed all the tests.
I am cautiously optimistic. By this point I think the tests are fairly comprehensive. And I understand the fixes and they make sense.
As I've come to expect, my results with Claude were mixed. It definitely did some of the grunt work of writing tests. And I have to give it credit for giving up on some of my old code and writing a simpler, more correct version. But it also came up with several incorrect, or at least incomplete, fixes.
I started this thinking I'd spend a few hours playing with AI. It ended up being my main project for a week. Even though it was rare enough that it wasn't really a problem, I'm glad I finally fixed it. (I hope!)