Static typing isn’t free. Where do you think the C++ angry mob comes from?

TL&DR Type inference works like logic deduction so any program in a statically typed language is two programs. The first one is the thing you sell, and the second – is a model that undergoes some sort of verification every time you run a compiler. This second program, although, often being written unknowingly, is not a free bonus, it’s something you have to pay for in several ways.

What does the real model checking look like?

There are dedicated tools for formal verification model checking languages and environments. One of these tools is Alloy – an environment and a language you can model your data structures, services, or processes in, and the Alloy then will verify that your model has no contradictions.

The language of Alloy is based on relational logic so why don’t we model something about relationships too? Let’s do an “Alice likes Bob” exercise.

But what makes people like other people? Well, the personality traits of course! So let’s define some traits.

Here `sig` is the short for `signature` and that’s how we define things in Alloy. In our case, we first define the concept of Traits, and not specific traits so we define them as `abstract`. Alloy speaks in sets, so `Traits` is the set that can include other sets but, being abstract, can’t participate in relationships by itself. The ones that can are declared below: `Kind` and `Intelligent` both pronounced subsets of `Traits` by `extends`. They are also declared as `one` meaning that they are both sets of one and only one element each.

These `abstract` or `one` thingies are not necessarily important for our model, but they help us form it closer to the way we describe people in our language. We don’t say “Bob is traits”, we say “Bob is kind”, that’s why `Traits` is made abstract. We don’t say “Bob is kind, intelligent, and kind again”, that’s why `Kind` is made a set of one and only one element.

Next, since we’re programmers and we consider programming languages as part of our identity, let’s presume that what we write in affects the way other people see us (in reality, it doesn’t).

So we need to declare some languages. We can borrow the same structure from the `Traits`.

Once again, `Languages` is an abstract concept, and `Asm`, `Cpp`, and `Alloy` exist one and only one each. That’s where the model diverges from reality. There are 6 versions of Alloy and this moment, 5 major standards of C++, and countless versions of the assembly. But a model doesn’t have to be exact to be pragmatic, so we’ll let this slide.

Once again, `People` is an abstract concept but now it has something new. It has named relations. The `character` relation connects a person with a set of personality traits, and the `writes` relation connects the very same person to the set of languages.

`Alice`, `Bob`, `George`, and `Steve` are specific, not abstract people, and each exists in “one and only one” exemplar.

Now let’s program office rumors. In Alloy, since it works in relational logic, and not relational gossip, rumors are called `fact`s.

So what do the rumors tell us? Bob is kind and intelligent. Bob writes in C++ and assembly. George is kind and writes in assembly and alloy. And Steve is intelligent and writes in C++. So far so good.

Now let’s say Alice likes people who are kind, intelligent, and write in C++. In Alloy, we write these things as boolean functions or predicates using the `pred` word.

Here `likes` is the name of the predicate. `a` and `b` are variables, each belonging to a set of People. `a = Alice` is the first condition. `Kind + Intelligent in b.*character` is the second, and `Cpp in b.*writes` is the third. When we write conditions each in its own line, we don’t even add the `and` word, we presume that all the conditions meet at the same time.

The third condition looks like the fact we declared before, but the second has something new, it has a `+` sign that combines two sets into one. So Alice likes people who are both kind and intelligent. We could have written this as two separate expectations, but why bother if we have the set algebra tools onboard?

Anyway, now that we have our model, we can add assertions and see if they hold. Does Alice like Bob?

Yes, she does. Bob fits the likeability definition with a slight overreach. He writes assembly along with C++ but it doesn’t make him unlikeable according to the `likes` predicate.

Does Alice like Steve or George?

assert {not likes [Alice, George]}
assert {not likes [Alice, Steve]}

No, she doesn’t. Steve isn’t kind, and George doesn’t even write C++.

So in Alloy, we can write down our model, and impose our assertions over it. Alloy will make sure that the model is sound. Now let’s see what elements of model checking C++ has to offer.

Model checking in C++

Technically, there are no sets in C++, but there are types. And types can be subtypes of each other so this is close enough to sets. E.g. we can define people like this.

Note that abstract class in C++ is not the same as abstract set in Alloy. In C++ abstractness concerns instantiation, and we’re not even planning to instantiate anything. All we need is to program relations between types that will be automatically checked during compilation.

In fact, we don’t even have to create the parent classes at all. C++ templates deduce from any declared type there is. So let’s declare languages without `Languages`.

Now we need to program our “facts”. A fact is something that sets a named relation between sets, and you know what sets a relation between types in C++? Basically everything. One thing would be a function declaration. So when we want to state that Bob writes in C++, we can do that by declaring a function `writes`, and giving it a specific overload for specific types.

But we can also take a shortcut and make a function into a fact itself. We don’t have to bind types (and we represent sets with types remember?) with functions, we can turn an overloaded function into a set itself. Like this.

As for the predicates, or conditional functions, a template function will do. Template functions accept parametric types and deduce them to concrete ones so... close enough.

The downside is that the type deduction only occurs on instantiation, so if we live it all like this, we’ll never run the check, and we’ll never know if Alice likes Bob. So let’s make an assertion, let’s instantiate the `likes` predicate.

So now what?

Some say that writing one program and having written two in the end is great! Write one, get one free! As someone who maintains other’s people code for a living, I disagree. Imagine that you have to maintain a C++ piece of code that has been written as its types relations have been. With no debugger, no coherent error messages, no profiler, and by someone who wasn’t even aware that he writes the program to begin with. That’s how 99% of all type-systems/relational models in the world have been written like. And it’s hell.

No, the second program doesn’t come for free. It doesn’t automatically model what you want, to make it so, you have to put conscious effort into composing it properly. You have to be aware that it exists, that it requires attention, and it requires understanding. You have to realize that with static typing, you are writing both in a run-time imperative language, and in a compile-time declarative one, and you have to pay with your effort to develop both programs.

But that’s not it. Speaking of “free”, some people think that everything that happens in compile time is “free” because it doesn’t have to happen in run-time then. Well, you can’t cheat reality, time is time, you have to make the computation somewhere. With static typing, you are effectively running a model checker every time you compile your program whether you want it or not, and it puts a toll on your build times.

I have been writing about languages for a while now, and I couldn’t help noticing, that the posts on C++ drag the most attention. C++ is not the most popular language, it’s not the most hyped, it’s not the most interesting. So how come, every time I write something about C++, there are thousands of C++ programmers on standby waiting to start arguing with each other?

Rust has type traits. You have to declare everything you write in terms of traits and you have to make sure that the types you expose fit the declared traits or otherwise the program wouldn’t even compile. This localizes type deduction to a single module. If a module compiles, it is automatically stackable with other modules.

Haskell has typeclasses that work in a similar manner. If a type belongs to a typeclass, it exposes all the methods the typeclass requires or it doesn’t compile. That also narrows down type deduction.

C++ has both type traits and typeclasses (well, concepts anyway). It also has unscoped templates though. This means that any parametric code can be potentially compiled with any type there is. And... the compiler has to check if it can on any instantiation.

This means that if you include a `vector` in several translation units, the compiler has to build the contents of `vector` with each translation unit and it types separately. This leads to enormous build times. I once measured the compile time for a moderately large, moderately templated SDK, and 45% of all the time went to template type deduction. Well, that doesn’t seem too bad, does it? For comparison, 50% of the time went to parsing. But it was mostly parsing the headers with those templates. Because a header includes several headers, and each includes several more, and each translation unit parses the whole tree anew trying to build it with its local types, and sometimes, not all the time, but nobody knows when there is a teeny-tiny inconsistency in the model these types are meant to represent, and the build fails after half an hour of struggling.

And that’s where all the angry C++ programmers on the internet come from. They are waiting nervously for their builds to finish.

Static typing isn’t free. Where do you think the C++ angry mob comes from?

What does the real model checking look like?

Model checking in C++

So now what?

Links