Friday, April 15, 2011

Fractal Trees

Another interesting data structure is the Fractal Tree used by Tokutek's TokuDB (another MySQL variant). I think this is related to the stratified data structure (at least some of the description is similar).

B-trees are at one point in the range of insert speed versus lookup speed. Fractal trees are much faster at inserts, but slower at queries.

It's easy to see how inserts work, and that most of them would be very fast. But part of the reason it's fast is that the cost is amortized - there are occasional big operations (a little like having to resize a hash table). It sounds like they're doing these in a background thread to avoid delays, but that has to make querying tricky.

And querying seems tricky regardless. A naive binary search in each level is going to be slow (and potentially many disk accesses or at least cache misses). To speed this up they add pointers from each value to it's position in the next level. They don't go into detail, but I have to think maintaining these pointers is going to make inserts more complex.

I'm not sure how this approach would work for an append-only index. Again, the average insert wouldn't write much, but the occasional insert that caused a lot of merging would write a lot. Maybe the amortized cost would still be reasonable.

It seems like it would make an interesting sort algorithm, but I have no idea how the performance would compare. It's a little like a merge sort, which has good performance.

Doesn't look like this is something I can use directly, but it's always good to learn new approaches.

No comments: