Thursday, October 08, 2009

A Java Immutable Persistent Map

I know this is going to seem like I'm reinventing the wheel again, but I honestly tried to find existing Java code for a persistent map. I also took a stab at extracting something usable from Clojure but it was not easy to untangle from the rest of the Clojure code. I does seem surprising that there isn't anything out there (at least easily findable). I guess Java programmers don't write functional code.

It took me most of a day and about 300 lines of code to implement, using the Ideal Hash Trees paper and the occasional glance at the Clojure code to see how it did things. I took a similar approach as Clojure but simplified somewhat. I also made mine compatible with the Java collections classes. And mine doesn't depend on anything other than the standard Java libraries, so it should be a useful starting point for anyone else.

I started implementing iteration but gave up for lack of time. (And YAGNI - I may not need it.) Iterating through trees is always a pain.

The code is not as clean as I'd like, e.g. the methods could be smaller, but I did write tests with pretty much 100% coverage. I should probably do some random stress testing to exercise it more. And there are some magic numbers in there like 0x1f and 5 and 32.

I didn't find it right away so I'll point out that if you're looking for CTPOP (count population) in Java it's Integer.bitCount

I tried to restrain myself from premature optimization and took the simplest approach I could. I'm sure it could be made to use less memory and run faster. But the algorithm is good, so speed should be reasonable. And it will certainly use less memory than naive copy on write.

It was actually a nice break from slogging away getting our accounting application tests to run on jSuneido.

PS. I realize that Git uses a persistent data structure, which is why it is so "cheap" to make new versions. I started implementing a Git-like system in Suneido a while ago, but at that point I hadn't run into persistent data structures. But tree data structures are not strangers anyway due to Suneido's btree database indexes.

5 comments:

Seun Osewa said...

Hi. This is interesting, but you forgot to link to the code!

Andrew McKinlay said...

Sorry about that. The latest version is at: http://suneido.svn.sourceforge.net/viewvc/suneido/jsuneido/src/suneido/util/PersistentMap.java?view=markup

Unknown said...

Great, very concise and to the point. Thanks for sharing!

BTW there is a Java library of persistent data structures: pcollections.
http://code.google.com/p/pcollections

Andrew McKinlay said...

Clojure is splitting out their data structures:

http://blog.higher-order.net/2010/06/11/clj-ds-clojures-persistent-data-structures-for-java/

Andrew McKinlay said...

I haven't tried it, but the Functional Java project appears to have some immutable persistent data structures. I found it via the new Functional Programming for Java Programmers book from O'Reilly.