library(typesys)

The typesys package formalizes R's type system into a simple language. The building blocks of the type language are constants and type variables.

Constant types represent concrete types. For instance, the constant type RInteger represents the type of a scalar integer in R. Unlike R, scalars and vectors have distinct (but related) types. Representatives for R types all have names that begin with "R", and generally follow the names returned by typeof(). There are two exceptions:

  1. An R "double" corresponds to the RNumeric type.
  2. An R "character" corresponds to the RString type.

The names RNumeric and RString were chosen because they give clearer descriptions of the values that are actually in their respective types.

Type variables are named placeholders for unknown types. The constructor for type variables requires a name to be specififed. The code to create a type variable called a is:

Variable("a")

Type variables serve two purposes. First, some functions have parameters where more than one type is allowed. For instance, the identity function

function(x) x

accepts an argument of any type. These functions are called polymorphic functions. Type variables are necessary to express the type of a polymorphic function. This is discussed in more detail below. Second, type variables are useful in any application where types may not be unknown. Type inference is an example application.

The Formula-based Type Language

Constructing expressions in the type language using the type constructor functions is more verbose than is really necessary. typesys provides a way to write type expressions concisely using R's formula notation. The rules are:

Thus we can express the type $(a, b) \to RInteger$ in formula notation as

exp = c(a, b) ~ RInteger
formula_to_type(exp)

Here the function formula_to_type() converts the formula to a type.

Unification & Substitution

typesys also includes an implementation Robinson's unification algorithm in the unify() function. Given two algebraic expressions, unification attempts to find a substitution of variables that makes the two expressions equal.

For example, suppose we have types $a \to RInteger$ and $RNumeric \to RInteger$ and expect that they are equal. Then we can unify these types to see that they are equal under the substitution $a \mapsto RNumeric$. We would write this as

x = formula_to_type(a ~ RInteger)
y = formula_to_type(RNumeric ~ RInteger)
unify(x, y)

The function emits an error when two types cannot be unified because they are incompatible.

x = formula_to_type(a ~ RInteger)
y = formula_to_type(~ RNumeric)
unify(x, y)

Successful unification returns a substitution of variables. Substitutions can be applied to existing terms by calling them on the term. For instance, by unifying the type variable a and an integer gives the substitution $a \mapsto RInteger$. We can then apply the substitution to any expression that contains a to replace a with RInteger. The code for this is:

x = Variable("a")
y = RInteger
sub = unify(x, y)
sub(x)

Finally, note that the unify() function is an inefficient prototype. In the worst case, it requires exponential time in the number of subexpressions being unified. Better algorithms for unification are known, with the best running in linear time. In the future, the unify() function may be changed to use a better algorithm.



nick-ulle/typesys documentation built on Jan. 21, 2020, 5:13 p.m.