Tagged pointers: a fresh look at runtime dispatch in C
July 20th, 2010
I stumbled upon a technique that I think is quite good. It requires no casts, allows untagged structures and primitive types to be polymorphic variants, does not depend on structural inheritance, can statically overload a function name to operate on multiple types, can inline if the type is known at compile time, and will warn during compilation if a case is missing for a particular type.
Here's an example of a graph with different node types. I left out the obvious details for clarity.
/* The type tags */
typedef enum {
NODE_GEOMETRY,
NODE_TRANSFORM,
NODE_GROUP
} node_type;
typedef struct node_geometry node_geometry;
typedef struct node_transform node_transform;
typedef struct node_group node_group;
/* This is the tagged pointer representation */
typedef struct {
node_type type;
union {
node_geometry *geometry;
node_transform *transform;
node_group *group;
} raw_pointer;
} node_pointer;
/*
These structs are disjoint plain old data.
No structural inheritance. Yay.
*/
typedef struct {
mat4 matrix;
} node_transform;
typedef struct {
geometry_model *model;
} node_geometry;
typedef struct {
int member_count;
node_pointer members[NODE_GROUP_SIZE];
} node_group;
/* Regular monomorphic functions accepting a plain old pointer */
static void node_geometry_draw(node_geometry *);
static void node_transform_draw(node_transform *);
static void node_group_draw(node_group *);
/* Dispatching on the tagged pointer */
void node_draw(node_ptr node)
{
switch (node.type) {
case NODE_GEOMETRY:
node_geometry_draw(node.raw_pointer.geometry);
break;
case NODE_TRANSFORM:
node_transform_draw(node.raw_pointer.transform);
break;
case NODE_GROUP:
/* Notice the error: the type signature of node_group_draw() will catch it. */
node_group_draw(node.raw_pointer.transform);
break;
}
};
We typically tag an object for runtime polymorphism. C++ adds a vtable. GObject copies C++. Numerous tutorials on the web store type information with the object or struct.
But type information isn't always needed in a struct because the struct's complete type (including size and member offsets) is known by the compiler. Type information is only needed when an unknown structure is accessed at runtime through a pointer. From this perspective it makes at least equal sense to store type information with pointers and leave structures uncluttered.
This technique uses a tagged union to represent a pointer to plain old data. Functions are dispatched on the type contained in the pointer representation. Dispatching with a switch statement gives us the benefit of some compiler optimizations and heuristics. If the pointer's type tag is proved to be constant the switch can be completely elided and the case statement inlined. If there are a large number of type tags or they are numerically all over the place, the compiler can emit the best code for that situation.
This is a multimethod approach to polymorphism. It can result in a more fragile make tree, where changes to a single structure will cause more recompilation. That is also a result of code organization so I might be a tad hasty with that criticism. On the upside, dispatching on multiple types is obvious and I get warnings when there are missing cases.
Because runtime type is stored externally in pointers, data objects can take any form in memory. This is advantages if they are members in other structures since the containing structure does not need the type tag. It also allows for representations defined in other libraries or having constraints like vertex and connectivity data for OpenGL.
Lastly, structural inheritance is overrated. I like different structural types that can share an interface. This technique does that efficiently while using the language instead of abusing it.
Your Response