Tuesday, April 16, 2019

Regular Expression Implementation

This detour started with an out-of-memory error, which I tracked down to a runaway regular expression, which was a result of old C++ code expecting a nul terminated string, even though it was taking a pointer and a length.

It was easy enough to fix, but it got me looking at my regular expression implementation. Suneido has its own regular expression implementation both for historical reasons, and for portability. When I wrote jSuneido I tried to use Java regular expressions, but it proved too hard to make it compatible with cSuneido. And by now we have a lot of application code that depends on the particular flavor of regular expressions that I originally implemented.

Although it's a fairly clean object-oriented design, there was one part where it was switching on types rather than calling a virtual method. That was because it needed to update some state. No big deal, I’d just put the state on an object and pass it around.

I decided to work on this on the Go version, and Go made it very easy to write a quick benchmark to make sure my changes didn’t slow it down too much.

What I found was that it was already quite slow, and even slower with the changes. Suneido applications don’t tend to use a lot of regular expressions, so the speed wasn’t a huge issue, but it bothered me nonetheless.

The implementation compiles regular expressions to an intermediate form which is then interpreted. The speed of an interpreter depends primarily on how “large” the instructions are. In this case the instruction are very small, e.g. matching a single character. And my fancy object-oriented design added even more overhead in terms of indirection and virtual (interface in Go) calls.

What if I took a different approach and used the classic big-switch-in-a-loop style of interpreter with all the code inlined? 

I made a quick prototype and benchmarked it as 300 times faster. I was a skeptical of that result, but it seemed worth investigating. As the code grew from a trivial prototype, the results came down to earth, ending up about 3 or 4 times faster. That was more believable, and still a good improvement. I suspect my first prototype was a little too simple and the compiler optimization eliminated most of it. That’s a common problem with micro-benchmarks.

Without all the object-oriented wrapping overhead, the inline interpreter wasn’t even that big, around a hundred lines of code.

Of course, while I was in there I couldn’t resist a few other optimizations. Suneido’s regular expression implementation uses fairly naive backtracking, which has some ugly worst cases. It’s a bit ad hoc, but I added specific handling for some of those bad cases. That’s where I saw up to 1000 times speed improvement, although only for those specific cases.

So I was happy with the speed improvement from this rewrite, but was the code harder to follow? I’m not sure. The interpreter is more monolithic, but it’s still reasonably small. And a whole bunch of interconnected small objects is not always easy to follow either. For me, it was a reasonable trade off.

Code in Github as usual