The Software Life: March 2025

Monday, March 24, 2025

Copy on Write

Copy on write is an interesting technique with a wide variety of applications. It's somewhat related to persistent immutable data structures, which are really "partial copy on write". Basically it's just lazy or deferred but with the addition of reference counting.

It started when I happened to be looking at our memoize code. (That's the correct spelling, it's different than "memorize"). When it returns a mutable object, it makes a defensive copy. Otherwise, if the object was modified it would modify the cached value.

Defensive copies are a standard technique, but they're often inefficient because if the caller doesn't modify the object then the copy was unnecessary.

One solution is to make the cached values read-only. Then they can't be modified and you don't need a defensive copy. But this has two problems. One is that people forget to make it read-only, since it works fine without it. The other is that often you do need to modify the result and then all the callers have to copy.

My first thought was to add an explicit CopyOnWrite method. But most people wouldn't understand the difference or remember to use it. We could use it in Memoize, but that was quite limited.

Then I realized that it probably made sense to just make the existing Copy method always be copy-on-write i.e. deferred or lazy copying. That was assuming that I could implement copy-on-write with low enough overhead that the benefit would outweigh the cost.

The simplest naive approach is to mark both the original and the copy as copy-on-write. But then if you later modified them both, you'd end up making two copies, whereas with normal copying you'd only have made one copy. The solution is to keep a shared "copy count", similar to a reference count for memory management. If the copy count is zero, then you can just modify the object without copying it, since you know you won't affect any other "copies".

When you make a lazy copy, you increment the copy-count. When you do an actual copy to allow modification, you decrement the copy-count. Ideally you'd also decrement the copy-count when an object was garbage collected. (perhaps with the new runtime.AddCleanup in Go 1.24)

One catch is that the copy-count must be shared. At first I thought that meant I had to put the data and copy count in a separate object with an extra level of indirection for all references to the data. Then I realized it was only the copy count that had to be shared. So I just allocated it separately. That meant I could access it with atomic operations which have low overhead.

Luckily I had an existing test for concurrent access to objects. This failed with my changes. The race detector also found problems. Objects are locked while reading or writing. But with copy-on-write there are multiple objects referencing the same data. Locking an object isn't sufficient to protect the data. One solution would be what I previous considered - keeping the data and the copy count separately, along with a lock. But then we're back to too much overhead.

I found the problem was that I was decremented the copy count before doing the actual copy. But as soon as the copy count went to zero, another thread could think it was ok to modify. I had to decrement the copy count after the actual copy. But that meant checking if the copy count was 0 separately from the decrement, which meant there was potential for two threads to check the copy count, both find it was 1, and both copying the object. I decided this would happen very rarely, and the only cost was an extra copy.

For once my code was structured so it was quite easy to implement this. Copying was done in a single place and update methods all called a mustBeMutable method. It only took about 40 lines of code.

And pleasantly surprising, this abstraction wasn't leaky and it didn't break or affect any of our application code. Running our application tests there were roughly 500,000 deferred copies, and 250,000 eventual actual copies. So it saved half of the copying - nice!

Saturday, March 15, 2025

Twenty Years

I happened to look at the list on the side of this blog and realized it's been twenty years since I wrote:

I'd better try out this blogging thing.

I've posted 710 times, not a huge number for that length of time, but more than a few. On average, about 35 a year, 3 per month. My most prolific year was 2009 when I averaged a post every 3 days.

Almost all of it is now irrelevant. Sometimes I think I should post more often, other times I think it's pointless and I shouldn't bother at all. But I don't regret the time I spent on it. If nothing else it often forced me to organize my thoughts and experiences and try to communicate them clearly. You could always imagine that someone would be helped by a post they found through a search engine. Of course, search engines are going downhill and increasily people rely on AI summaries, which makes blogs mere fodder for AI.

This blog has been on Google Blogger the whole 20 years. As much as I don't care for Google these days, back then they were the rising star. I can't think of another blogging service that has been around since then. It's amazing they haven't discontinued Blogger like they did Reader. One of these days I should move to a different platform.

Twenty years seems like a long time until I remember I've been programming for 50 years. As much as technology has changed, programming is really not that much different than it was 50 years ago. It's been a fascinating career. If AI replaces programmers, our generation of programmers might be the last/only to enjoy it.

Monday, March 03, 2025

return throw

In a programming language with exceptions (like Suneido), one of the API design questions is when to return an error, and when to throw it.

It's helpful to distinguish "queries" and "commands". (Command-query separation) A query returns a result. A command performs an action, it may or may not have a return value.

For dynamically typed languages like Suneido query functions can easily return an error value like false or a string. If you forget to check for errors it will usually cause a runtime error. But if you forget to check for errors with a command function, they'll be lost. It might cause a problem later but that can be hard to debug.

From the perspective of using the API, both approaches have pros and cons.

checking a return value is generally faster and less code than try-catch
it's too easy to forget to check a command return value, especially if failure is rare
you can't forget to check an exception (unless you have a try-catch somewhere that deliberately ignores exceptions)
return values have to be checked on every call, exceptions can be handled for larger scopes

C++ and C have nodiscard and Java has JSR-305 @CheckReturnValue annotation but these are static compile time checks, not a good fit for a dynamic language like Suneido.

I came up with the idea of "return throw" (avoiding the need for a new keyword). It returns a value like a regular return. But if the return value is not used (discarded) then it throws an exception.

As I started to use this, I realized it could be more dynamic. A successful return could be just a regular (ignorable) return. whereas error returns could be return throw.

if error
    return throw false
return true

That got a little repetitive so I changed return throw to automatically treat true and "" as success. i.e. a return throw of true or "" would be treated as a normal return and the result could be ignored. But a return throw of any other value e.g. false or "invalid foo" would throw if the result was ignored.

Another issue was if F() did return throw, and G() did return F() that shouldn't be treated as "using" the result. Luckily that turned out to be relatively easy to handle.

I made a bunch of the built-in functions return throw and that has helped track down a few issues. Otherwise it isn't too popular yet. Hopefully it will prove worthwhile in the long run.