Daniel Lemire's recent blog post Recognizing string prefixes with SIMD instructions and one he linked to Modern perfect hashing for strings prompted me to revisit how I was recognizing keywords in gSuneido. Although I find it interesting how you can use advanced processor instructions, I didn't really want to get into that. But like most things, even without looking at my code, I figured it could probably be improved.
I found that I was recognizing language keywords (28) with a linear search, sorted by frequency of occurrence. For query keywords I was using a Go map. Both were quite fast, certainly not a bottleneck.
The reason I hadn't used a Go map for language keywords was that I wanted to "intern" the strings. And there's no way to retrieve the key of a Go map (unless you loop through all of them). Normally it doesn't matter since you need the key to do a lookup in the first place. But the key I'm doing a lookup with is a slice from the source code. If I hold onto that key, it will keep the entire source code in memory (because the reference prevents garbage collection).
One of the first things I tried (Compile-time switches from the Modern article) was one of the fastest (4x faster than the old linear search) - switch on the length, and then switch on the i'th character (chosen for each length). For example:
switch len(s) { case 2: switch s[1] { case 'f': if s == "if" { return tok.If, "if" } case 's': if s == "is" { return tok.Is, "is" }
I find it a little surprising that this turned out to be the fastest since it has quite a few branches - two switches plus a string comparison. Using Compiler Explorer to look at the assembler, I found the code was quite good. I was expecting a call to a string comparison function, but it is unrolled inline. It may actually be beneficial that the switches are explicit so each branch can get its own prediction.
Based on the assembler, I simplified the code. This was just as fast, smaller source code, and smaller machine code.
switch len(s) { case 2: if s == "if" { return tok.If, "if" } if s == "is" { return tok.Is, "is" }
Because it was comparing constant strings and it knew the length matched, it compared them as multi-byte integers instead of strings. 2, 4, and 8 bytes matched int16, int32, and int64. For lengths of 3, 5, 6, and 7 it used a combination e.g. 6 characters was an int32 and an int16.
I did try various other approaches.
I tried finding "perfect" hash functions and table sizes, which turned out to be trickier than I thought. It was almost as fast as the switch version.
I tried another hash table implementation, but it was slower than the Go map.
I tried switching on the length and then doing a linear search.
In the end I went with the switch version since it was fastest. It was a little more verbose, but Tabnine handled most of the code generation.
Did this micro-optimization make any difference to the overall speed? No, not that I saw from a quick test. But it was an interesting investigation. And yes, premature optimization can be a mistake. But ignoring speed is also a mistake.