Friday, May 17, 2024

Attaching Extra Data to Trees

This is nothing original, just something I've been working with and thought I'd document.

Say you have a tree made up of nodes in a tree. In my case I'm working with abstract syntax trees. And in some case you need to add extra information to nodes. For example, gSuneido uses the same AST for several purposes - to compile the code, to do static checks, to do syntax based searches in the IDE, and to do source code formatting. Those things all require slightly different information.

Here's an example node struct.

The simplest approach is just to add the extra information to the node struct:

The advantage of this approach is that it's simple and has low overhead. The disadvantage is that you always have that extra space whether you need it or not. I'm using this approach for basic source position information that is needed for compiler error messages and debug information.

A variation of this approach would be to make "extra" a pointer to a separate struct. If the data was large and you only needed it on some nodes, this could save space.

Another approach is to insert additional "wrapper" nodes.

The advantages of this approach is that it can be applied anywhere and it can be applied selectively. The disadvantages are that code that processes the tree has to deal with these extra nodes. I'm using this approach for some additional source position information that is needed for syntax based searches in the IDE.

Another approach is to store the information "off to the side".

This also has the advantage of being able to apply it to any node selectively and traversing the tree doesn't have to worry about the extra information. The disadvantage is that you have another data structure to deal with. I'm using this approach to store comment and newline position information for source code formatting.

No comments: