Here’s a project in which I’m implementing an in-memory database of GUI layout trees.

In the previous post, we looked at the general design of the DB.

In this entry, let’s talk about how to provide a list of arrays to back the database with memory.

TreeList Part 2: Memory

POC: ArrayofArrayLists

Since my trees are scattered across heterogeneous types, I want to store those types somewhere - ideally, in some arrays.

Where to put those arrays?

Into a list of them - an array of arrays:

const node_types = struct{
  type_1 = type,
  type_2 = type,
  // etc.
};
const Storage = ArrayOfArrays(node_types);
// under the hood:
const Storage = [_]Arrays{
  array_type_1 = [_]type_1,
  arary_type_2 = [_]type_2,
}

But how can I make a list of those arrays when they’re heterogeneous? Can I use Zig’s Tuple type?

Tuples are not really an option: you need to know which entries you’re accessing at compile time, which defeats the purpose of this library.

Well, then let’s erase the types:

const ArrayData = struct {
    data: [*]u8 = undefined,
    len: usize = 0,
    capacity: usize = 0,
    elem_size: usize,
}

But now that the types are erased, how can I cast values back to their true types?

By mapping the identifier of the table to the underlying type that was erased.

I have a list of tables:

ComponentDb = [_]Erased{
    nodeType1: std.ArrayListUnmanaged(NodeType1),
    nodeType2: std.ArrayListUnmanaged(NodeType2),
    nodeType3: std.ArrayListUnmanaged(NodeType3),
    // etc.
};

They’re indexed using an enum:

const TypeEnum = EnumFromTypes(node_types);

and I can convert each variant into a type:

fn convert(id: TypeEnum){
  const MyType = typeFromId(id);
}

So at compile time, I need to create such a mapping.

Type Registry

In the standard lib’s enums module, this is done with the EnumIndexer: you pass it an enum, and it associates array indexes to each enum variant. For the needs of the TreeList, I built a TypeRegistry, following similar principles:

pub fn TypeRegistry(comptime TypeStruct: type) type {
  
  // an enum based on the TypeStruct fields
  pub const TypeId = EnumFromTypes(TypeStruct); 

  /// converts a type to an enum variant
  pub fn idFromType(comptime T: type) TypeId {}

  // a list of the types from the TypeStruct fields
  pub const Types = getTypesSlice(TypeStruct);

  // retrieves a backing type based on an enum variant
  pub fn typeFromId(id: TypeId) type {}
}

Usage becomes similar to the multi-array list’s API - you can pass enum literals into typeFromId():

const TestTypes = struct {
    field1: type = u8,
    field2: type = u16,
};

const Registry = TypeRegistry(TestTypes);

Registry.typeFromId(.field1);
Registry.typeFromId(.field2);

Basic Storage

The goal is to use the TypeRegistry as indexer into the array of arrays:

pub fn DirectStorage(comptime Types: type) type {
  return struct {
    const Self = @This();
    const Registry = TypeRegistry(Types);
    pub const TypeId = Registry.TypeId;

    // Storage arrays - directly accessible by TypeId
    arrays: [Registry.Types.len]ArrayData = undefined,

    const ArrayData = struct {
      data: [*]u8 = undefined,
      len: usize = 0,
      capacity: usize = 0,
      elem_size: usize,
      // bunch of methods here…
    }
  // bunch of storage methods here
  }
}

Most of the methods I have omitted here are similar to those found in an ArrayList: append, ensureCapacity, etc. This is one of these pieces of code where the underlying ideas are simple, but the implementation is finicky.

Returning pointers?

There’s a limitation here: if I want to return pointers to entries, I’ll have to return a pointer to a union of the backing types. Otherwise, the compiler will cringe.

But to return a pointer to a tagged union, you must actually have a union stored in memory. This would require that my data be stored as unions in a single array that contains all my types. That’s not my case: I have one backing array per type.

So we’ll have to apply some cunning here: the union will be at the pointer level, instead of at the data level. I will have a tagged union, which’s variants each describe a pointer to one of the underlying types.

Schematically:

const TestTypes = struct{
  field1: type = u8,
  field2: type = u16
};

const TestUnion = union(enum){
  field1: *TestTypes.field1,
  field2: *TestTypes.field2
}

The beauty of this, is that the variants all occupy the same space in memory, since they’re all pointers. Obviously, that union of pointers has to be created at compile time, and can use the TypeRegistry’s backing enum as tag:

const NodePtrUnion = blk: {
    var union_fields: [Registry.Types.len]std.builtin.Type.UnionField = undefined;

    inline for (@typeInfo(TypeEnum).@"enum".fields, 0..) |field, i| {
        const T = Registry.typeFromId(@as(TypeEnum, @enumFromInt(i)));
        union_fields[i] = .{
            .name = field.name,
            .type = *T,
            .alignment = @alignOf(*T),
        };
    }

    break :blk @Type(.{
        .@"union" = .{
            .layout = .auto,
            .tag_type = TypeEnum,
            .fields = &union_fields,
            .decls = &[_]std.builtin.Type.Declaration{},
        },
    });
};

So, when I have to call getNodePtr() to retrieve a node from a tree, it gets straight-forward:

pub fn TreeList(comptime Types: type) type {
    const Registry = TypeRegistry(Types);
    const TypeEnum = Registry.TypeId;
    return struct {
        const Self = @This();
        pub const Loc = Location(TypeEnum);
        pub const TableEnum = TypeEnum;
        pub const NodeUnion = TypeUnion;
        pub const PtrUnion = NodePtrUnion;
        pub fn getNodePtr(self: *Self, loc: Loc) ?PtrUnion {}

Remember that the Location is a generic packed struct that holds the table identifier, and the array index of the value:

pub fn Location(comptime TableEnum: type) type {
    return packed struct {
        table: TableEnum,
        idx: u32,
    };
}

The good news is that the implementation of the getters can use the backing TypeEnum to directly index into the list of tables:

pub fn getNodePtr(self: *Self, loc: Loc) PtrUnion {
    const array = &self.arrays[@intFromEnum(loc.table)];
    return switch (loc.table) {
        inline else => |tag| blk: {
            const T = Registry.typeFromId(tag);
            const ptr = array.getPtr(T, loc.idx).?;
            break :blk @unionInit(PtrUnion, @tagName(tag), ptr);
        },
    };
}

What about alignment?

There’s one last small perversion in this process: you need a common alignment for all the pointers.

The std lib’s multi-array list has the same detail:

pub fn MultiArrayList(comptime T: type) type {
  return struct {
    bytes: [*]align(@alignOf(T)) u8 = undefined,
    // […]
  }
}

I (accidentally, I swear!) tried to do without worrying about the alignment. Thankfully, Zig runs checks against that, and will give you a panic when trying to access a pointer that’s mis-aligned.

When trying to align multiple types, there’s no way around it - the common alignment has to be largest of all the types’:

pub fn TreeList(comptime Types: type) type {
  const Registry = TypeRegistry(Types);
  return struct {
    const Self = @This();
    // Calculate the maximum alignment needed for any type
    const max_alignment = calculateMaxAlignment(Registry);
    // Storage arrays with maximum alignment for all types
    arrays: [Registry.Types.len]ErasedArrayData(max_alignment) = undefined,
  }
}

In the next entry, we’ll talk about the tree design and how to implement traversals in idiomatic Zig.