Wednesday, February 01, 2023

Fuzz Testing Database Queries

The gSuneido database was ready for production, as far as I could tell. All our tests passed.

But there was a steady trickle of bugs showing up. Sure, they were obscure cases, but they were showing up in actual application code, so they weren't that obscure.

Every time I'd think it was ready to deploy further, another bug would show up.

I finally decided I had to do something to flush out these obscure bugs. Waiting for them to show up in production was not the way to go.

I had thought about trying to fuzz test database queries before, but it always seemed too hard. Even if I could figure out a way to generate random but legal queries, how would I check the results? Then I realized I could compare jSuneido and gSuneido. It wasn't a perfect test since they were roughly the same design and therefore could have matching bugs. But I had made enough changes and improvements to gSuneido that they were now significantly different. And even if it wasn't a perfect test, it was a heck of a lot better than nothing.

I puzzled over how to generate valid random queries. And what kind of data to use. I started with a simple set of four tables with obvious joins between them. The joins seemed to be the most constrained element, so I started with simply picking randomly from a fixed list of joins.

e.g. cus join ivc

I represented the query as a tree of nested objects and wrote a function to randomly pick a sub-expression branch.

unions are added by randomly picking a branch and replacing it with: branch union branch

e.g. (cus join ivc) union (cus join ivc)

leftjoins are added by randomly replacing some of the join's.

Then where, rename, extend, and remove (project) are added at random spots.

(((cus where ck = 12) join ivc) union ((cus leftjoin ivc) extend x1)) remove c1

Finally it optionally adds a sort.

The resulting queries aren't necessarily realistic, but they seemed to cover most of the variations.

It's a little more ad-hoc than I originally hoped. You could generate random queries from the grammar, and they would be valid syntax, but queries also have to have valid semantics, and that isn't represented in the grammar.

First I just parsed and optimized the queries, no execution. This soon triggered some assertion failures which uncovered a few bugs.

The next step was to compare the actual results between gSuneido and jSuneido. I decided the simplest approach was to calculate a checksum of the result and output queries and their result checksums to a text file. Then I could re-run those queries on jSuneido and compare the checksums.

In the end I found about a dozen bugs. Several of them were in jSuneido, which is a little surprising since it has been in production with thousands of users for about 10 years.

The problem with finding a bug through fuzzing, is that the inputs it finds are often messy. Some fuzzing systems will try to simplify the inputs for you. I just did it manually, starting the debugging of each failure by simplifying the query as much as possible while still retaining the failure. Fuzzing can make finding bugs easier, but it doesn't really help with fixing them. And the more you fuzz, the more obscure the bugs get. On the other hand, by the end I got pretty good at tracking them down, inserting prints or assertions or using the debugger.

I added all the failed queries to a "corpus" that gets run every time, along with new random queries as a defense against regressions. In theory they would get generated randomly again, but that could potentially take a long time.

I called it success when I ran about 500,000 queries (roughly two hours of processing) with no differences. The plan is to add this to our continuous test system and fuzz for an hour or so per night when we have spare compute power. That should prevent regressions and possibly find even more obscure bugs.

I'm pretty happy with how this turned out. It only took me a few days work to write the fuzzing system (not counting the time to actually fix the bugs), and it flushed out a bunch of bugs that I'm sure would have haunted me for a long time otherwise.

Of course, in hindsight I should have done this 20 years ago, or at least 10 years ago when I wrote a second implementation (jSuneido). Sometimes I'm a little slow! But better now than never.

Maybe the counterpart to Eric Raymond's "given enough eyeballs, all bugs are shallow" is "given enough fuzzing, all bugs are visible". (Of course, it's never "all", but you get the idea.)

No comments: