Thursday, September 13, 2018

A Better Hash Table

Reading A new fast hash table in response to Google’s new fast hash table got me thinking about hash tables again. I wrote an implementation of his last hash table design back in January. It worked well, but memory usage was higher than I would like, and this new design promised to improve that.

However, this time he didn't explain the design, just shared the code. So it took me a little bit to figure out exactly how it worked. The basics are straightforward but there are a few tricky parts.

The data layout is the same as my last version - a flat array of blocks, each holding separate small arrays (e.g. 8 entries) for keys, data, and control information. The advantage of this layout is that it prevents wasting space for padding, while still maintaining locality. It can also be allocated with a single request.

This hash table is chained (i.e. collisions are handled with a linked list) but using empty slots rather than separate allocations. And the chains are not linked with a pointer or an index, but with an index into a set of jump sizes. So we can't link to an arbitrary slot, but only to a particular set of slots relative to the origin. In practice this seems sufficient.

Simple open addressing (e.g. linear probing) ends up with "chains" (probe sequences) containing different hash indexes. This leads to longer searches. We could do something similar, but it's better if we ensure that the chain starting at a certain index only contains entries with that same hash index. That means if you want to insert and the slot is part of another chain then you have to move the other chain from that point on. (If the chains were actual linked lists, you could just move the one element and then link to the rest of the chain. But since we only support certain specific jumps, we can't do that in the general case. And so we need to move the entire rest of the chain.)

I also used Fibonacci hashing which adds an additional hashing step to spread out values more than mod would.

Here's an illustration of how it works. For simplicity I'll use numeric keys and use of hash index of the first (tens) digit (without the additional Fibonacci hashing).

Inserting non-colliding elements is simple.


If we insert a colliding element, we "chain" it.


The chaining might have to skip occupied slots. If we insert 11, 12, and 31 we'd get:


Note: Chaining starts out looking like linear probing but when the chain gets longer the jumps get larger. 

Note: 10 and 30 are "direct hits", whereas 11, 12, and 31 are "chain" slots. We use one bit of the meta data to record this.

When we insert 20 we find its slot is used for a chain so we move the chain from that point on (11,12) so we can insert 20 where it "belongs".


This allows lookups to take advantage of the direct/chain distinction. If we look for 60 we find a chain slot (11) and we know immediately that it's not in the table. Whereas, if we look for 15 (same hash as 10,11,12), we find a direct slot so then we need to follow the chain to determine if the value is present.

When we delete an item from anywhere but the end of the chain, we move the end of the chain into the deleted item's slot, shortening the chain by one without moving all the elements. So if we delete 10 we get:


Note: Deleting 20 would not undo the chain move that was done when we inserted 20. 

I started with a Go implementation, although I'll probably port it to C++ as well. The initial version of the code is on Github.

After some tweaking, the performance is about 10% faster than the previous version, which is not huge, but considering the previous version was very fast, that's not bad. The best part is that it used about half as much memory - exactly the benefit that was promised.