TreeList: programming trees without pointers
Intrigued by the example set by Andrew Kelley in his Programming without pointers presentation, I recently set out to reading a bit of the source code of Groovebasin, which implemented the ideas he presented.
As it turned out, his approach matched the needs of one of my projects pretty well, down to the serialization approach. I decided to explore it further by adapting it.
In this series of blog posts, I’ll be dev-logging through the implementation of a TreeList: an in-memory database of heterogeneous trees, which relies on a pool of arrays to hold the nodes of the trees.
The actual implementation can be found here. Fair warning: this is just a proof of concept - don’t let me catch you trying to rely on this code for your production needs.
Tree List Part 1: initial design
Premise
For the sake of example, let’s frame the need for such a data structure:
I need to describe some pre-made audio FX layouts. Each FX which has a pre-made layout can contain tabs, buttons, text labels, images, knobs, sliders, switches, etc. - any kind of interactive inputs or decorative elements which need to be drawn in a GUI stack. The components can be combined and nested in any way needed to represent the final layout. This makes layouts ideal candidates to be represented as trees.
The goal is to have thousands of layout trees in memory and readily available for representation in a user-interface.
The data structure
Locations and heterogeneous nodes
Backing a heterogeneous memory layout is going to require storing each node type into an ArrayList. So, we know that the initial call to the API of the data structure is going to be something akin to:
const ComponentDb = TreeList(NodeType1, NodeType2, NodeType3);
where each NodeType
is going to be assigned an underlying ArrayList. Each ArrayList is going to be identified using an enum (variantNodeType1
, variantNodeType2
, etc.).
// birds-eye view of the lists:
ComponentDb = std.ArrayListUnmanaged{
nodeType1: std.ArrayListUnmanaged(NodeType1),
nodeType2: std.ArrayListUnmanaged(NodeType2),
nodeType3: std.ArrayListUnmanaged(NodeType3),
// etc.
};
From there, we can assume that the Location
type is going to be an aggregate of the table enum, and the index into the table:
const Location(TableEnum: type) type{
// I assume this struct to be packed
return packed struct {
.table: TableEnum,
.idx: u32,
}
}
Since we can assume the TableEnum to be a u32
as well, we can optionally represent a location as a u64
for portability, and cast it as a Location(TableEnum)
whenever we need to run a lookup.
How to retrieve an FX layout?
We want to be able to retrieve the root of a tree using hash table lookups:
const fx_root_node = componentDb.get(fx_name);
and then proceed to traversing:
for (fx_list)|fx_name|{
const fx_root = componentDb.get(fx_name);
gui.traverse(fx_root);
}
Since I potentially will have thousands of FX layouts, we want to minimize string repetitions, and could use an interning strategy. More on this later.
Traversing using labeled switches on the GUI
The goal is to pass the tree data structure into the GUI loop (which runs using ImGui), where the GUI dispatches which draw functions need to be called depending on the node type:
switch(@typeOf(node)){
.type1 => drawType1(node),
.type2 => drawType2(node),
}
or something similar…
N-ary trees
So, at the root of each fx layout, there is a list of nodes, and each of those nodes might contain a list of children nodes, each of which might also contain children node, etc.
Mentally, this maps easily to an N-ary tree,
// n-ary tree,
// each node has a list of children
A
/ | \
b c d
/|\ | /|
e f g h i j
with the main disadvantage of an N-ary; which is that each node would potentially have to store a list of its children - or pointers/locations of its children.
No matter what the memory-backing strategy is going to be, I want the tree to be intrusive: pointer/location metadata must be stored inside the nodes themselves. Also, the tree is going to be heterogeneous: different types of child nodes are allowed, depending on the GUI widgets that they represent.
Binary trees instead
However, there’s ways to simplify the design. Instead of using an N-ary tree, we can use a binary tree: for each node, we keep a link to the first child (left) and a link to the next sibling node (right). The links are each represented as a Location
.
const Node = struct{
value: u32,
child: Location(TableEnum),
sibling: Location(TableEnum),
};
This dramatic simplification turns the sequence of sibling nodes into a linked list. This implies that adding/removing into the list is relatively easy: for a removal, get the node to remove and its parent, and update the next sibling location of the parent to point to the node’s next sibling.
I’ll talk more about the trees in a follow-up post.
The enum identifying tables:
Here’s an example piece from Groovebasin’s implementation. It uses string interning in the string_bytes, and it uses the Index enum internally for each of the ArrayHashMaps in its database.
files: std.ArrayHashMapUnmanaged(File, void, File.Hasher, false),
// other lists after this
pub const File = extern struct {
directory: Path.Index,
basename: String,
pub const Index = enum(u32) {
_, // non-exhaustive enum
};
};
That’s a nice Zig idiom: you use a non-exhaustive enum for type-safe indexing.
Now we know that the initial call to the TreeList is going to be done by passing a list of types to the function. This can either be done with a slice, or a struct of types:
const DbType = TreeList(.{ NodeType1, NodeType2, NodeType3 });
const impl: DbType = .empty;
We know that TreeList()
must return a container type that carries all the details we need:
- a backing enum for Locations
- one arrayList per provided type
- a decl literal or some way of instantiating, with all the backing lists.
For now: root nodes’ locations need to be retrieved based on some kind of string hash map. I’m going to implement string interning for this, following Andrew Kelley’s example in programming without pointers
.
That’s kind of a double whammie: I’m using the enum-indexing idiom for the TreeList, and for the string interning.
Next up, I’ll talk about how the backing arrays of the DB are laid out.