Sunday, September 10, 2023

A gSuneido Database Optimization

While working on the recent gSuneido database issue I had an idea for optimizing the transaction conflict checking code.

The current code kept a list of outstanding transactions and for each a list of tables and their reads and writes. To check a new read or write meant a linear search through all the transactions for other activity on that table. If the data was organized by table rather than by transaction, it would eliminate the linear search. An action on a "rare" table wouldn't have to look at so much data. Some tables would be common to many transactions, but still normally not all of them. And even in a situation where all the outstanding transactions included a particular table it wouldn't be any slower than before.

At a high level it was a small change, but I wasn't sure how it would fit with the code. I knew I'd still need to keep a list of transactions and the tables they had accessed. It turned out to be relatively straightforward to modify the code. (A few hours work.) One of the questions was where to embed data and where to use pointers to separately allocated data. Embedding tends to be faster, but embedding in maps or slices can use more memory because empty slots are larger.

Before starting the changes I wrote a Go benchmark so I could see what effect the changes had on performance. As usual, it was hard to know what a "typical" pattern of activity was. For this benchmark the new version was over twice as fast. That was a bigger difference than I'd expected. Sadly, but not surprisingly, the change made no difference to the speed of our application test suite. It doesn't do many concurrent transactions so it wouldn't benefit. However, the stress test I'd used to investigate the "too many outstanding transactions" issue did show big improvements. When I collected the same measurements as before and graphed them, it looked even better. Up to three threads it didn't make much difference but after that it was substantially better. And performance didn't drop off under load like it had before. (Or at least, not until much higher load.)

The conventional wisdom for optimization would be to profile the code and find the hot spots. That wouldn't have helped in this case. The impact of the data structure was spread over the code. Nor would profiling have given any insight into a better data structure. As I commonly find, what is required is to have good mental models of the problem, the code, the programming language, and the hardware. Then you need to find an approach that is a good fit between all of these. Of course, the hardware performance model is a moving target. These days it's mostly about cache misses and branch prediction.

I'm always happy to improve performance, especially by significant amounts. But in the back of my mind I can't help thinking I should have written it better the first time around. Of course, that doesn't make sense because you can't write the "perfect" version. There's always going to be room for improvement.

No comments: