Tuesday, April 26, 2011

Semi-Immutable Data Structures

In the new append-only database storage engine for jSuneido I've ended up with several data structures that are "semi-immutable" i.e. they are immutable some of the time and mutable some of the time.

(These are in-memory data structures, the database file itself is only ever appended to, existing data is always immutable i.e. never updated.)

I've flip flopped back and forth a few times from completely immutable to partially immutable, but I seemed to have settled on partially immutable.

The overall "state" of the database is kept immutably. When a database transaction starts, it "copies" this state. (The "copy" consisting of just a few pointers to persistent immutable data structures.)

"Inside" a transaction, the data structures can become mutable. The first update a transaction does will path copy nodes up to the root. This "disconnects" the data structure from the overall databases state. Nodes within the data structure can now be either shared and immutable or not shared and mutable.

(Allowing mutability within transactions is "safe" because transactions are thread-contained i.e. only one thread at a time works on a transaction.)

The main advantage of this approach is performance since nodes can be updated with less copying.

At the end of a transaction commit, the new state is stored (persisted) in the database file. This converts all the nodes back to immutable.

This semi-immutable approach is used both for the table index btrees and the hash tries used to store redirects and database info (e.g. btree roots).

No comments: