Wednesday, July 16, 2025

Partnering with AI

I'm not a leading edge AI user. I have used Tabnine for the last few years, but mostly as a smarter auto-complete. Handy and time saving but nothing earth shaking. Mostly I was a skeptic. But using Cline with Claude Sonnet 4 lately has definitely changed my attitude.

One of the first things I noticed was the "rubberducky" effect. Just the effort of explaining a problem to AI often helps think of a solution. Explaining a problem to another person (or an AI) is even better because they ask questions and offer solutions. Claude often says "I see the problem" when it's totally off track. That's a little annoying, but it's no worse than what a person would do. And often those wacky ideas can spark some new ideas or avenues to explore.

A more surprising aspect is that it feels like I'm collaborating. I've been solo programming for most of my (long) career, but the last five years working remotely has exaggerated that. No one reads the majority of my code or comments on it. It's not that I'm socializing with the AI, but suddenly I have someone I can "talk" to about it. What does this do? Do we need that? What if we did xyz? Isn't that dangerous? It might not "know" the code like a true collaborator, but it's a very close approximation. And it never hurts to be exposed to other ways of doing things.

I don't vibe code with AI. I want the end code to be as good as I can make it. And with current AI that means keeping a very close watch on what it's doing. Like a beginning programmer, it seldom gets a good solution on the first try. It often requires reviewing changes closely and not being afraid to reject them. Often, you have to steer it to an elegant solution. I'm working on heavily used production code, not a throwaway toy project.

The agent approach seems to be a fundamental improvement. It's impressive to watch Claude add debugging statements, run the code, "look" at the output, and repeat until it tracks down the issue. At the same time, it can also make quite blatant mistakes so you need to be watching closely.

Honestly, I miss writing code when I'm working with AI. It's like pair programming with someone who never gives you the keyboard. Of course, there's nothing stopping me from writing some of the code myself, and I do. But when Claude can spit it out faster than I can type, it seems pointless not to let it.

One of Claude's quirks is that it has a positive tone. It's always saying "good idea" or "you're absolutely right". I found that a little goofy at first, but once I got used to it I find I like it. Who doesn't like positive feedback? I know it's meaningless, but I find when I use less positive models, I miss it.

Monday, July 07, 2025

Super Instructions

The overhead of an interpreter depends on the size of the instructions compared to the dispatching. So one of the ways to reduce the overhead is to increase the size of the instructions. One way to do that is “super instructions” - combining several instructions into one. This caught my interest and I decided to see whether it would help with Suneido. The Suneido byte code instructions themselves are mostly simple, which means a higher overhead, but most of the time is spent in built-in functions like database queries, which means a lower overhead.

The first step was to determine if there were combinations of instructions that were common enough to justify optimizing them. I added some quick temporary instrumentation to the interpreter. First I looked at sequences of two instructions. The most common accounted for roughly 7% of the instructions. The top 10 made up 40% of the instructions. That was actually higher than I expected. I also looked at sequences of 3 and 4 instructions. The most common was 3% so I decided to keep it simple and stick to 2 instruction sequences.

% op1 op2
7.0 Load Value
6.7 Value CallMethNoNil
5.2 Value Get
4.8 Load Load
4.5 This Value
2.7 Store Pop
2.5 This Load
2.3 Get Value
2.2 Pop Load
2.2 Global CallFuncNoNil
40.1 Total

The compiler code generator funnels through a single emit function so it was easy to modify that to detect the sequences and replace them with combined instructions. And it was easy to add the new instructions to the interpreter itself. If I’d stopped there it would have been a quick and easy one day project. But, of course, I had to benchmark it. At first that went well. Micro benchmarks showed clear improvement as I expected. But running our application test suite showed a slight slowdown of 1 or 2%. That puzzled me.

One possibility was that it slowed down the compiler. Running the test suite requires compiling all the code. The changes did slow down the compiling slightly but compiling is fast enough that it doesn’t even show up in profiling. I optimized the code slightly but that wasn’t the problem.

That left the interpreter. gSuneido uses a giant switch to dispatch instructions. Go doesn’t allow some of the other dispatching methods you could use in e.g. C(++). I had added 10 cases to the switch. Did that slow it down? How did Go implement switches? Searches didn’t turn up a lot, but it sounded like it used a binary search. Running the code in the debugger confirmed this. Maybe adding more cases had added another level to the binary search? I tried rewriting the switch as a hand written binary search using if-else, optimized for the most frequent instructions. That quickly turned into an ugly mess. More searching found Go issue 5496 to implement switches with jump tables. The issue appeared to be completed. I used Compiler Explorer to test and sure enough, Go compiled dense integer switches as jump tables. But not my interpreter switch??? I tried simplifying the switch, ordering the cases, removing fallthrough, etc. But the debugger still showed a binary search. I found the implementation in the Go compiler source code. It seemed straightforward; there were no special requirements. I tried one of my simple examples from Compiler Explorer inside my gSuneido project. That didn't show as a jump table in the debugger either. What the heck!

Back to the Go compiler source code. There was a condition on base.Flag.N, what was that? Oh crap. That's the -N command line option that disables optimization, which is used by the debugger. So whenever I was using the debugger to look at the assembly language code, I was seeing the unoptimized version. A little digging and I figured out how to use gdb to disassemble the production code and found it had been using a jump table all along. Argh!

Back to the original question - why was it slower? Maybe the extra code in the interpreter was affecting inlining? I looked at the compiler inlining logging but everything still seemed to be getting inlined correctly.

Looking at the cpu profile, the interpreter time decreased as I'd expect. The only noticeable increase was in garbage collector time. But the changes don't do any allocation. They would actually decrease the size of compiled code slightly. The allocation from the test suite was slightly higher. But the memory profile didn't show anything.

In the end, I gave up. I can't figure out why the overall test suite seems slightly slower. There doesn't seem to be any evidence that it's a result of the changes. Everything else shows an improvement so I'm going to keep it. I can only guess that the slowdown is a result of perturbing the code, affecting layout, or caching, or branch prediction, or timing. Modern cpu's are so complex that they are somewhat non-deterministic.

Code is in Github as usual.