Schema and Selection

I commonly struggle with “proliferation of types”. Different usecases call for different views of the data (especially clients). This leads to a half dozen data types that represent about the same information. I always assumed this was unavoidable to achieve safety and clarity. However, Rich Hickey proposes a new take on the problem.

This post is a response to Maybe not by Rick Hickey.

The Problem

First, a few clarifying examples.

Suppose we have a todo list application. It’s not so unusual that we might want to fetch the todo list details along with the individual tasks.

1
2
3
4
5
6
7
class TodoList{
    string Title;
    DateTime CreateDate;
    // other data...

    Task[] Tasks;
}

The problem is that this contract always implies the existence of child tasks. However, that isn’t always desirable

If we want to save, are the tasks saved in the same call as the TodoList? If not, they shouldn’t be there.
What about showing all TodoLists for a user? We don’t want to require the whole hierarchy when we only need a fraction of the info.

We shouldn’t leave it to the programmer to implicitly know when that data should be present.

The same can be said for the non-complex properties of a type. We probably only want the title and ID if we’re suggesting todo lists from a search field. Everything else is extraneous.

Sadly, all roads seem to lead to extra types. Common solutions are in the vein of

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class TodoListSaveContract{
    string Title;
    DateTime CreatedDate;
    // all direct properties, no tasks
}
//Alternatively
class TodoListDetailViewModel{
    TodoList ListDetails;
    Task[] Tasks;
}

class TodoListSuggestionModel{
    Id Id;
    string Title;  
}

Types can multiply quickly and mapping is both tedious and error-prone. It is easy for each model to slightly vary the parameter names too, which hurts understandability.

I’ve limited mapping in my system through the delineation of service purposes, but it still feels bad every time I need to map a contract for a new scenario. Though it feels much less bad than the system getting entangled by unnecessary data connections.

The Alternative

Rich Hickey continues to impress me with the way he thinks deeply about the fundamental concepts of and communication though programming languages.

The concept he proposes is the separation of schema and selection.

Type-only membership

First, this requires decoupling our data from named or positional data containers. That means no properties like in records and classes, and no positional reliance like tuples.

This can be solved with type-based aggregates. Clojure accomplishes it via spec/def and spec/keys.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
;; lat is a float between -90 and 90 
(s/def ::lat (s/and float? #(<= -90 %) #(<= % 90))) 
;; lon is a float between -180 and 180
(s/def ::lon (s/and float? #(<= -180 %) #(<= % 180)))
;; a coordinate should have a lat and a lon 
(s/def ::coordinate (s/keys :req [::lat ::lon]))

(def yosemiteCoords {::lat 37.748837 ::long -119.58723})

;; print the latitude
(print (::lat yosemiteCoords)) 

Data is accessed and aggregated by type name.

This is great because accessing data requires only the bear conceptual minimum: an idea of what guarantees the data meets and existence of the data. Clojure can even generate property-based tests since the guarantees are communicated directly in code.

Who owns optionality

Optionality is represented by presence or absence of a key. If a key is required and doesn’t exist, the compiler or runtime can throw an error. If optional, the consumer checks for the presence of the key.

This strikes at an important question: who owns optionality?

Consider Rich’s example. If a Car type has a make and model, when are those properties optional? We don’t know. It depends on the context. Some cases will need that info and others won’t. Thus, optionality is context-dependent and cannot be part of a generally usable type scheme.

In short, it is always up to the consuming context to declare what it does and doesn’t need, and decide if the data meets those criteria. Like when you parse a JSON file, the programmer has to decide how to interpret a missing property. Further, data sources don’t need to know about optional properties if they don’t use them.

This clicks with the haskell concept of “parse, not validate” from this delightful article.

Partial Contracts

This all pulls together into a prototype language syntax for separating schema generally and selection per-context.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
;; build up a task schema
(s/def ::title string?)
(s/def ::is-done bool?)
(s/def ::task (s/schema [[::title ::is-done]]))

;; build up a todo-list schema
(s/def ::task-list (s/* ::task))
(s/def ::date inst?)
(s/def ::list-id int?)
(s/def ::todo-list (s/schema [[::list-id ::date ::task-list ::title]]))

;; require only properties needed for saving in the method contract
(save-todo-list todo-list => (s/select ::todo-list [::list-id ::date ::title]))

This allows maximial reuse of the same data types without compromising on clarity of required information in each usecase.

At the time of the presentation, this was just a proposal. Briefly examining the documentation also indicates that this is not finished as of writing this post. The idea, however, is powerful and I hope to see it mature.

Reservations

While I believe this idea is powerful, I still have questions and reservations.

What about the case where an aggregate uses two of the same type, but with different semantic intent? For example, a ToAddress and FromAddress or Author (a user) Recipient (a user). One could make sub-types for these cases, but that feels a bit weird.

I also believe that names are an important communication of intent. I suppose the argument here is that you express that intent in the type name and can reuse it instead of a fixed-context name like with property names. I feel like I’d need to use it to get a good sense for how I actually feel about the shift.

The general schema also feels like a strong temptation to break service boundaries and tie services through the general schema. I suppose this is already a temptation and it will always be up to the designers on where to draw hard boundaries.

Conclusion

Articulating that optionality belongs to the consuming context is a powerful conceptual advance. I think Rich’s proposal for separating general schema and required member selection per-context is promising for more expressive and less redundant type systems!

The Problem#

The Alternative#

Type-only membership#

Who owns optionality#

Partial Contracts#

Reservations#

Conclusion#