Thursday, October 01, 2009

Language Connoisseur

It took me a few tries to come up with the right word. Addict? Fanatic? Polyglot? Aficionado?

I think "connoisseur" fits best - "A person of informed and discriminating taste e.g. a connoisseur of fine wines."

I love to read about different programming languages, but I wouldn't say I actually "learn" them. I've only written significant amounts of code in three languages, C, C++, and now Java. It takes years for me to feel like I'm getting "good" at a language.

But I can "taste" other languages, and like a wine connoisseur, critique them (at least in my own mind). Ah yes, made from JVM grapes, definite functional nose, aged in strong typing, goes well with concurrent meals. (Sorry, I'm stretching, but you get the idea.)

It's interesting to note that 4 of the 6 languages that I've read about recently are JVM based. That's partly because I'm interested in JVM based languages because of jSuneido. But I think it also indicates a trend. There are languages based on .Net as well but for some reason they don't seem to be as popular, or at least have as many books.

One of my big interests these days is concurrency, again partly because of having to face it with jSuneido. But it's also something we all have to face as cpu's no longer get faster, and instead we get multiple cores.

It's becoming pretty obvious that threads, mutable data structures, and locking are not the answer. They work, and with a lot of effort it's possible to write programs using them, but it's hard and error prone. I think they will become the "machine language" of concurrency. Present under the covers but not dealt with directly by most people.

So the alternatives in other languages are pretty interesting. Erlang and Scala have "actors". Clojure has software transactional memory.

Pure functions and immutable objects seem to be part of the answer. Transactional memory is promising. All three of these go together.

When I first started reading about Clojure I was confused by the references to "persistent data structures". I was thinking of persistence as saving things in a database. But that's not what they're talking about. Persistent data structures  in this sense are data structures that are optimized for making modified immutable "copies" that share representation. The classic and simplest example is a single linked list (like in Lisp). For example, when you add a new item to the front of a list the new list "shares" the old list.

Suneido strings are immutable and use persistent data structures. Unlike most languages, when you concatenate onto a string in Suneido it doesn't create a whole new string. Instead, behind the scenes, it creates a list of two strings. This is transparent - to the programmer it's just as if it created a new string.

This answered a question that had been lurking in the back of my mind. Isn't it really slow to use large immutable objects in functional languages? Now I can see how it can be done efficiently using some clever algorithms like Clojure's persistent hash map.

I'd been trying to figure out how to implement an efficient concurrent version of the schema cache in jSuneido. Suneido stores the database schema in tables in the database. But for speed it caches it in memory. The schema doesn't tend to change very often so it is well suited to making it immutable (and therefore not requiring locking). But it does change occasionally, and when it does, transactions have to see the appropriate version. (Suneido uses multi-version database transactions.) A persistent data structure is perfect for this because each transaction can have it's own immutable "copy" without physically having to copy the schema (which can be quite large).

However, there's a catch. cSuneido loads the schema lazily (on demand). This adds a twist - the schema cache is logically immutable, but physically mutable. That would require locking, which I'm trying to avoid.

Then I realized that lazy loading was to speed up start up. But jSuneido is primarily (at least initially) aimed at the server, where start up speed is not as important. So I can just load the whole schema at start up, eliminating the problem. Nice.

There is another catch. Java doesn't come with any persistent data structures :-(  and I don't particularly want to write my own.  Java doesn't even have a single linked list.

I did a quick search but I didn't find any third party libraries either.

One possibility is to use the Clojure data structures. (they're written in Java) I took a quick look at the source code and it seems like it might be feasible but I'm not sure how dependent they are on other parts of Clojure. If I'm lucky they'll be usable on their own.

Of course, adopting other code always has it's price. If there are bugs I won't know how to fix them. If Clojure fixes, updates, or improves the code I'll have to decide whether to switch to the new version.

Fun stuff! :-)

1 comment:

sanotto said...

To follow the suneido motto, you should write the data structures yourself :-)
Just kidding ...