Tuesday, November 18, 2025

Roaring Bitmaps

I like useful algorithms I can implement (at least the basics) in a few hours.

Bitmaps are a useful tool for a variety of applications. But they can be large. I was thinking of using them to intersect sets of database records. Even if I only used one bit per 16 bytes, a 32 gb database would be 256 mb (log2: 35 bits - 7 bits = 28 bits) - too big.

People have tried different approaches to shrink bitmaps. Run length encoding works fairly well but is slower to access.

Roaring bitmaps take advantage of several different approaches. The space is divided into 64 K bit blocks (8 K bytes). At the top level there is a sorted expanding list of blocks (usually relatively small) which can be binary searched. The values within a given block are 16 bit. Each block is stored in one of several different representations. If there are less than 4 K entries (a sparse block) they are stored as a sorted expanding list that can be binary searched. If there are more than 4 K entries (a dense block) then they are stored as a bitmap. An improved version adds a third run length encoded block type. With my usage I don't expect contiguous values so I didn't implement this block type.

Lookups are logarithmic, a binary search to find the block and then either another binary search or a bitmap probe within the block.

There are open source implementations available, but I only needed a subset of features so it was just as easy to write my own. (only 150 lines of code) To reduce allocation, I used a Go sync.Pool to recycle blocks.

The original papers are Better bitmap performance with Roaring bitmaps and Consistently faster and smaller compressed bitmaps with Roaring