For some reason I started thinking about hash tables again. I realize only a geek would think so, but I find them pretty cool. How can you not like a data structure whose lookup time is O(1) i.e. independent of size. The other fascinating part is that there are a huge number of different designs for hash tables, each with its own set of tradeoffs.
To beat the critics to the punch ... Did I have a pressing problem that I needed to solve? Nope. Did I benchmark the old code or the new code? Nope. Would it make more sense to just use std::unordered_map and quit reinventing the wheel? Probably. Was it a fun challenge that would give incremental improvements? I think so. I could say it's because I'm semi-retired and can afford to mess around. But I've been doing it all along. What's Suneido but the biggest example of reinventing the wheel. Right or wrong, it’s what I enjoy and the way I roll, and I think it’s worked out pretty well. I try not to take it too far . Originally I wrote my own garbage collectors for cSuneido. I'm not talking about yet-another-reference-counting-class, this was a full blown conservative bit mapped garbage collector. A fun challenge and probably some of my best work. But when the obviously better Boehm collector came along I switched over and threw out my own version.
cSuneido uses a pretty standard traditional hash table design - an array of pointers to nodes with separate chaining for collisions. It has worked fine for 20 years. But it has some drawbacks - every insert requires an allocation, there's considerable space overhead (pointers, separately allocated blocks, padding), and all the indirection means it's not cache friendly.
The design of the Go hash table is quite interesting. It uses an array of buckets, each holding e.g. 8 entries. Within a bucket there are separate arrays for keys and values to eliminate padding. Although the initial buckets are allocated in one large array, overflow buckets are chained. It also caches the high byte of hashes to speed up searches.
I reimplemented the Go hash table when I was working on gSuneido. (I couldn't just use the built-in one because it doesn't allow arbitrary types for keys like Suneido needs.) At that point the Go one was written in C, nowadays it's written in Go (albeit unsafe).
So I thought I'd rewrite cSuneido's hash table using the Go design. It took me a day or two to code and debug and it worked well.
But I remembered an article I'd read with the provocative title of "I Wrote the Fastest Hashtable". It uses open addressing, linear probing, robin hood hashing, prime sizing, and a probe limit. One drawback was that it was using a traditional array-of-structs model, which means wasted space for padding, especially since it includes a one byte value per slot which is pretty much guaranteed to get padded. But that issue is easy to overcome with an approach similar to the Go design, by storing the table in "blocks", and within each block using separate arrays for the keys and values (and the one byte "distance"). However, unlike the Go design, the table is still treated as a single large flat array. The tradeoff is that it slows down indexing slightly.
So I took another day or two and implemented this design. I love the simplicity of the lookup code:
int i = index_for_key(k);
for (int d = 0; d <= distance(i); ++d, ++i)
if (HE::keyeq(k, key(i)))
return i;
return -1;
You can't get much simpler than that. (Another nice feature of this design is that bounds checks are eliminated by adding extra slots on the end, which is feasible because of the probe limit.)
This design trades off slightly more expensive inserts and deletes for faster lookups. For the usual case of more lookups than updates, that's a good tradeoff. One negative aspect is that resizing the hash table will be slightly more expensive. But that's probably not a large factor because when you're growing a hash table, the new table is sparse and therefore collisions are less frequent.
One feature that makes me a little nervous is that the table grows if an insert exceeds the probe limit, with the probe limit set to log2(n). In most hash tables if all the keys hash to the same value you end up with a linear search. In this design the table would grow until you ran out of memory or hit a size limit. (hash table denial of service vulnerability). But in practice, the prime table sizes make it almost impossible for this to happen.
It took me a bit longer to debug than I expected with the simple code. The insert and delete are a little trickier with robin hood hashing because elements get moved around. C++ templates also gave me a bit of grief, as usual.
Testing with (pseudo) random values I got the expected good results:
- average probes for lookup was 1.3
- average load at resize was .75
- average swaps per insert was 1.3 (although max was 55!)
An average load at resize of .75 means the overall average load is .56 (half full). That's fairly typical for hash tables. It's the cost you pay for the performance. On the positive side, this design has only 1 byte of overhead per slot, and no overhead for pointers or padding or separate allocation, so it's better than many hash tables.
Here's a snapshot of the code. It'll end up in the cSuneido repo at some point. Note - it's not a full STL container since that requires a lot of extra code that I don't need. It also assumes that keys and values are simple objects (like integers or pointers) that are fast to assign and copy. See the article linked above for a more general purpose implementation.
To beat the critics to the punch ... Did I have a pressing problem that I needed to solve? Nope. Did I benchmark the old code or the new code? Nope. Would it make more sense to just use std::unordered_map and quit reinventing the wheel? Probably. Was it a fun challenge that would give incremental improvements? I think so. I could say it's because I'm semi-retired and can afford to mess around. But I've been doing it all along. What's Suneido but the biggest example of reinventing the wheel. Right or wrong, it’s what I enjoy and the way I roll, and I think it’s worked out pretty well. I try not to take it too far . Originally I wrote my own garbage collectors for cSuneido. I'm not talking about yet-another-reference-counting-class, this was a full blown conservative bit mapped garbage collector. A fun challenge and probably some of my best work. But when the obviously better Boehm collector came along I switched over and threw out my own version.
cSuneido uses a pretty standard traditional hash table design - an array of pointers to nodes with separate chaining for collisions. It has worked fine for 20 years. But it has some drawbacks - every insert requires an allocation, there's considerable space overhead (pointers, separately allocated blocks, padding), and all the indirection means it's not cache friendly.
The design of the Go hash table is quite interesting. It uses an array of buckets, each holding e.g. 8 entries. Within a bucket there are separate arrays for keys and values to eliminate padding. Although the initial buckets are allocated in one large array, overflow buckets are chained. It also caches the high byte of hashes to speed up searches.
I reimplemented the Go hash table when I was working on gSuneido. (I couldn't just use the built-in one because it doesn't allow arbitrary types for keys like Suneido needs.) At that point the Go one was written in C, nowadays it's written in Go (albeit unsafe).
So I thought I'd rewrite cSuneido's hash table using the Go design. It took me a day or two to code and debug and it worked well.
But I remembered an article I'd read with the provocative title of "I Wrote the Fastest Hashtable". It uses open addressing, linear probing, robin hood hashing, prime sizing, and a probe limit. One drawback was that it was using a traditional array-of-structs model, which means wasted space for padding, especially since it includes a one byte value per slot which is pretty much guaranteed to get padded. But that issue is easy to overcome with an approach similar to the Go design, by storing the table in "blocks", and within each block using separate arrays for the keys and values (and the one byte "distance"). However, unlike the Go design, the table is still treated as a single large flat array. The tradeoff is that it slows down indexing slightly.
So I took another day or two and implemented this design. I love the simplicity of the lookup code:
int i = index_for_key(k);
for (int d = 0; d <= distance(i); ++d, ++i)
if (HE::keyeq(k, key(i)))
return i;
return -1;
You can't get much simpler than that. (Another nice feature of this design is that bounds checks are eliminated by adding extra slots on the end, which is feasible because of the probe limit.)
This design trades off slightly more expensive inserts and deletes for faster lookups. For the usual case of more lookups than updates, that's a good tradeoff. One negative aspect is that resizing the hash table will be slightly more expensive. But that's probably not a large factor because when you're growing a hash table, the new table is sparse and therefore collisions are less frequent.
One feature that makes me a little nervous is that the table grows if an insert exceeds the probe limit, with the probe limit set to log2(n). In most hash tables if all the keys hash to the same value you end up with a linear search. In this design the table would grow until you ran out of memory or hit a size limit. (hash table denial of service vulnerability). But in practice, the prime table sizes make it almost impossible for this to happen.
It took me a bit longer to debug than I expected with the simple code. The insert and delete are a little trickier with robin hood hashing because elements get moved around. C++ templates also gave me a bit of grief, as usual.
Testing with (pseudo) random values I got the expected good results:
- average probes for lookup was 1.3
- average load at resize was .75
- average swaps per insert was 1.3 (although max was 55!)
An average load at resize of .75 means the overall average load is .56 (half full). That's fairly typical for hash tables. It's the cost you pay for the performance. On the positive side, this design has only 1 byte of overhead per slot, and no overhead for pointers or padding or separate allocation, so it's better than many hash tables.
Here's a snapshot of the code. It'll end up in the cSuneido repo at some point. Note - it's not a full STL container since that requires a lot of extra code that I don't need. It also assumes that keys and values are simple objects (like integers or pointers) that are fast to assign and copy. See the article linked above for a more general purpose implementation.
No comments:
Post a Comment