Thursday, July 20, 2023

Keyword Recognition

Daniel Lemire's recent blog post Recognizing string prefixes with SIMD instructions and one he linked to Modern perfect hashing for strings prompted me to revisit how I was recognizing keywords in gSuneido. Although I find it interesting how you can use advanced processor instructions, I didn't really want to get into that. But like most things, even without looking at my code, I figured it could probably be improved.

I found that I was recognizing language keywords (28) with a linear search, sorted by frequency of occurrence. For query keywords I was using a Go map. Both were quite fast, certainly not a bottleneck.

The reason I hadn't used a Go map for language keywords was that I wanted to "intern" the strings. And there's no way to retrieve the key of a Go map (unless you loop through all of them). Normally it doesn't matter since you need the key to do a lookup in the first place. But the key I'm doing a lookup with is a slice from the source code. If I hold onto that key, it will keep the entire source code in memory (because the reference prevents garbage collection).

One of the first things I tried (Compile-time switches from the Modern article) was one of the fastest (4x faster than the old linear search) - switch on the length, and then switch on the i'th character (chosen for each length). For example:

switch len(s) {
case 2:
    switch s[1] {
    case 'f':
        if s == "if" {
            return tok.If, "if"
        }
    case 's':
        if s == "is" {
            return tok.Is, "is"
        }

I find it a little surprising that this turned out to be the fastest since it has quite a few branches - two switches plus a string comparison. Using Compiler Explorer to look at the assembler, I found the code was quite good. I was expecting a call to a string comparison function, but it is unrolled inline. It may actually be beneficial that the switches are explicit so each branch can get its own prediction.

Based on the assembler, I simplified the code. This was just as fast, smaller source code, and smaller machine code.

switch len(s) {
case 2:
    if s == "if" {
        return tok.If, "if"
    }
    if s == "is" {
        return tok.Is, "is"
    }

Because it was comparing constant strings and it knew the length matched, it compared them as multi-byte integers instead of strings. 2, 4, and 8 bytes matched int16, int32, and int64. For lengths of 3, 5, 6, and 7 it used a combination e.g. 6 characters was an int32 and an int16.

I did try various other approaches.

I tried finding "perfect" hash functions and table sizes, which turned out to be trickier than I thought. It was almost as fast as the switch version.

I tried another hash table implementation, but it was slower than the Go map.

I tried switching on the length and then doing a linear search.

In the end I went with the switch version since it was fastest. It was a little more verbose, but Tabnine handled most of the code generation.

Did this micro-optimization make any difference to the overall speed? No, not that I saw from a quick test. But it was an interesting investigation. And yes, premature optimization can be a mistake. But ignoring speed is also a mistake.