I commonly struggle with “proliferation of types”. Different usecases call for different views of the data (especially clients). This leads to a half dozen data types that represent about the same information. I always assumed this was unavoidable to achieve safety and clarity. However, Rich Hickey proposes a new take on the problem.
This post is a response to Maybe not by Rick Hickey.
First, a few clarifying examples.
Suppose we have a todo list application. It’s not so unusual that we might want to fetch the todo list details along with the individual tasks.
The problem is that this contract always implies the existence of child tasks. However, that isn’t always desirable
- If we want to save, are the tasks saved in the same call as the TodoList? If not, they shouldn’t be there.
- What about showing all TodoLists for a user? We don’t want to require the whole hierarchy when we only need a fraction of the info.
We shouldn’t leave it to the programmer to implicitly know when that data should be present.
The same can be said for the non-complex properties of a type. We probably only want the title and ID if we’re suggesting todo lists from a search field. Everything else is extraneous.
Sadly, all roads seem to lead to extra types. Common solutions are in the vein of
Types can multiply quickly and mapping is both tedious and error-prone. It is easy for each model to slightly vary the parameter names too, which hurts understandability.
I’ve limited mapping in my system through the delineation of service purposes, but it still feels bad every time I need to map a contract for a new scenario. Though it feels much less bad than the system getting entangled by unnecessary data connections.
Rich Hickey continues to impress me with the way he thinks deeply about the fundamental concepts of and communication though programming languages.
The concept he proposes is the separation of schema and selection.
First, this requires decoupling our data from named or positional data containers. That means no properties like in records and classes, and no positional reliance like tuples.
This can be solved with type-based aggregates. Clojure accomplishes it via
Data is accessed and aggregated by type name.
This is great because accessing data requires only the bear conceptual minimum: an idea of what guarantees the data meets and existence of the data. Clojure can even generate property-based tests since the guarantees are communicated directly in code.
Who owns optionality
Optionality is represented by presence or absence of a key. If a key is required and doesn’t exist, the compiler or runtime can throw an error. If optional, the consumer checks for the presence of the key.
This strikes at an important question: who owns optionality?
Consider Rich’s example. If a Car type has a make and model, when are those properties optional? We don’t know. It depends on the context. Some cases will need that info and others won’t. Thus, optionality is context-dependent and cannot be part of a generally usable type scheme.
In short, it is always up to the consuming context to declare what it does and doesn’t need, and decide if the data meets those criteria. Like when you parse a JSON file, the programmer has to decide how to interpret a missing property. Further, data sources don’t need to know about optional properties if they don’t use them.
This clicks with the haskell concept of “parse, not validate” from this delightful article.
This all pulls together into a prototype language syntax for separating schema generally and selection per-context.
This allows maximial reuse of the same data types without compromising on clarity of required information in each usecase.
At the time of the presentation, this was just a proposal. Briefly examining the documentation also indicates that this is not finished as of writing this post. The idea, however, is powerful and I hope to see it mature.
While I believe this idea is powerful, I still have questions and reservations.
What about the case where an aggregate uses two of the same type, but with different semantic intent? For example, a
Author (a user)
Recipient (a user). One could make sub-types for these cases, but that feels a bit weird.
I also believe that names are an important communication of intent. I suppose the argument here is that you express that intent in the type name and can reuse it instead of a fixed-context name like with property names. I feel like I’d need to use it to get a good sense for how I actually feel about the shift.
The general schema also feels like a strong temptation to break service boundaries and tie services through the general schema. I suppose this is already a temptation and it will always be up to the designers on where to draw hard boundaries.
Articulating that optionality belongs to the consuming context is a powerful conceptual advance. I think Rich’s proposal for separating general schema and required member selection per-context is promising for more expressive and less redundant type systems!