knitr::opts_chunk$set(error=TRUE, comment=NA) suppressWarnings({ library(vetr) library(assertive) library(assertthat) library(checkmate) })

Systematically vetting function parameters is as tedious as it is important. Fortunately many functions and packages exist to assist with the process. We will review:

We compare several packages to `stopifnot`

for the task of checking function
parameters. We ignore all other features. The following table summarizes the
strengths of each package:

assertthat | assertive | checkmate | vetr | |
---|---|---|---|---|

Simple | ✓ | |||

Concise | ✓ | ✓ | ||

Informative | ✓ | ✓ | ✓ | ✓ |

Fast | ✓ | ✓ |

And in our opinion what each package does best **in the context of function
parameter checking**:

`assertthat`

: no learning curve, better error messages.`assertive`

: excellent error messages.`checkmate`

: powerful semantics for checking vectors, fast.`vetr`

: powerful semantics for checking complex object structure, fast.

More details on the categories in the summary table:

Ideally packages should have a minimal learning curve.

`asserthat`

hews closely to`stopifnot`

semantics so it is trivial to learn.`assertive`

,`checkmate`

require familiarity with their many Predicate Functions.`vetr`

requires understanding how to build the templates used in structure checks.

The benefit to increased complexity is the ability to express complex requirements succinctly:

`checkmate`

provides highly configurable functions, and a domain specific language for vector checks.`vetr`

infers all required structural checks from a template object.`assertive`

is focused on single parameter Predicate Functions, which leads to more verbose checks.`assertthat`

relies primarily^{1}on standard R expressions so is equivalent to`stopifnot`

in this respect.

One drawback for `stopifnot`

is that the error messages it produces are
sometimes cryptic. All the reviewed packages seek to improve on this:

`assertthat`

has clearer error messages for the base`is.*`

,`any`

,`all`

, and the handful of bundled Predicate Functions, although no additional information is provided except for`any`

and`all`

(position of failure).`assertive`

, and`checkmate`

provide clearer error messages with additional information for the included Predicate Functions, particularly for`assertive`

.`vetr`

provide clearer error messages with additional information for object structure (type, length, attributes), and for values with`all_bw`

.

All packages allow users to attach custom error messages to Check Expressions or Predicate Functions, but here we focus on those available "out of the box".

Parameter checks should add minimal overhead.

`stopifnot`

is as fast as the R expressions you use for it.`assertthat`

is comparable but adds some overhead.`assertive`

is substantially slower.`checkmate`

is fastest for simple checks, and faster for vectors.`vetr`

is fastest for long vectors and complex objects.

We base our conclusions on the tests we ran. It is entirely possible different tests could lead to different results. See benchmarks.

We want to write a functions that accept a length two numeric vector with no
missing values. To illustrate we use functions that enforce that requirement
and do nothing else, starting with `stopifnot`

:

simple_stopifnot <- function(x) stopifnot(is.numeric(x), length(x) == 2, !is.na(x))

`vetr`

looks similar on the surface:

simple_vetr_a <- function(x) vetr(is.numeric(.) && length(.) == 2 && !is.na(.))

The `vetr`

arguments are matched to those of the enclosing function. As a
result we reference `x`

with `.`

, and we must use `&&`

instead of `,`

to
delimit our checks.

Additionally, `vetr`

introduces templates, so we can rewrite `simple_vetr_a`

as:

simple_vetr <- function(x) vetr(numeric(2L) && !is.na(.))

`numeric(2L) && !is.na(.)`

is a Vetting Expression that
contains the Template Token `numeric(2L)`

, and the Standard
Token `!is.na(.)`

. Template Tokens
require that parameters match their structure (i.e. length, type, and
attributes). Standard Tokens, marked by the presence of the
`.`

symbol, are evaluated as they would be by `stopifnot`

.

Templates should be familiar to `vapply`

users, but there are some wrinkles.
For example zero-length templates like `numeric()`

match any length objects.
See `?vetr::alike`

and `vignette('alike', 'vetr')`

for details.

`asserthat`

is like `stopifnot`

:

simple_assertthat <- function(x) assert_that(is.numeric(x), length(x) == 2, !anyNA(x))

`assertive`

and `checkmate`

rely on the Predicate
Functions and the accompanying assertions they implement:

# assertive: 200+ simple Predicate Functions simple_assertive <- function(x) { assert_is_numeric(x) assert_is_of_length(x, 2) assert_all_are_not_na(x) } # checkmate: 40+ flexible/complex Predicate Functions simple_checkmate <- function(x) assertNumeric(x, any.missing=FALSE, len=2)

For this type of check the improvements from the third party packages seem marginal until we look at the result with an illegal parameter:

simple_stopifnot(pi) simple_vetr(pi)

In addition to what our parameter is not, `vetr`

tells you what it is, gives you
the original call of the function, and gives you the input as it appears
in the calling frame (i.e. `length(pi)`

instead of `length(x)`

).

`assertthat`

makes the error message friendlier, but does not add information:

```
simple_assertthat(pi)
```

The other packages improve on the error message, in particular by telling you what the object is in addition to what it is not:

simple_assertive(pi) simple_checkmate(pi)

Here we wish to verify that an input conforms to the structure of the `iris`

built-in data set.

# make a bad version of iris iris.fake <- iris levels(iris.fake$Species)[3] <- "sibirica" # tweak levels

Then, with `stopifnot`

:

iris.col.classes <- lapply(iris, class) complex_stopifnot <- function(x) { stopifnot( is.data.frame(x), # this only checks class is.list(x), length(x) == length(iris), identical(lapply(x, class), iris.col.classes), is.integer(attr(x, 'row.names')), identical(names(x), names(iris)), identical(typeof(x$Species), "integer"), identical(levels(x$Species), levels(iris$Species)) ) } complex_stopifnot(iris.fake)

While some of these checks may seem over-the-top, R's informality with respect to S3 classes make them necessary. For example, nothing guarantees that an object with class "data.frame" has type "list" as it should.

`vetr`

carries out all those checks and more by inferring them from a template:

# zero row DF contains structure info only, and matches any # of rows iris.template <- iris[0,] complex_vetr <- function(x) vetr(iris.template) complex_vetr(iris.fake)

`vetr`

recursively traverses the template and the function parameter in parallel
and checks each sub element of the latter against the former. The error
messages are also better. Notice how you can copy all of or part of
`levels(iris.fake$Species)[3]`

from the message into the R prompt for further
examination.

`checkmate`

is reasonably succinct:

complex_checkmate <- function(x) { assertDataFrame(x, types=unlist(iris.col.classes), ncols=5) assertTRUE(is.list(x)) assertInteger(attr(x, 'row.names')) assertNames(names(x), identical.to=names(iris)) assertFactor(x$Species, levels=levels(iris$Species)) } complex_checkmate(iris.fake)

`assertive`

, and `assertthat`

end up with the same number of explicit checks as
`stopifnot`

, and with similar error messages so we omit them here. See the code
appendix for those implementations.

complex_assertive <- function(x) { assert_is_list(x) assert_all_are_equal_to(length(x), 5) assert_is_integer(attr(x, 'row.names')) assert_is_data.frame(x) assert_is_identical_to_true(identical(iris.col.classes, lapply(x, class))) assert_is_identical_to_true(identical(names(x), names(iris))) assert_is_identical_to_true(identical(typeof(x$Species), "integer")) assert_is_factor(x$Species) assert_is_identical_to_true( identical(levels(x$Species), levels(iris$Species)) ) } complex_assertthat <- function(x) { assert_that( is.data.frame(x), # this only checks class is.list(x), length(x) == length(iris), identical(lapply(x, class), iris.col.classes), is.integer(attr(x, 'row.names')), identical(names(x), names(iris)), identical(typeof(x$Species), "integer"), identical(levels(x$Species), levels(iris$Species)) ) }

Suppose we wish to ensure our input is a strictly positive numeric vector with
no missing values. With `vec <- -1:1`

and `stopifnot`

we would use:

vec <- -1:1

vector_stopifnot <- function(x) stopifnot(is.numeric(x), !anyNA(x), all(x > 0)) vector_stopifnot(vec)

`vetr`

implements the `all_bw`

function primarily for speed, but it also
generates more useful error messages:

vector_vetr <- function(x) vetr(numeric() && all_bw(., lo=0, bounds="(]")) vector_vetr(vec)

`asserthat`

is like `stopifnot`

, with a better error message:

vector_assertthat <- function(x) assert_that(is.numeric(x), !anyNA(x), all(x > 0)) vector_assertthat(vec)

`checkmate`

implements a powerful notation for checking vectors:

vector_checkmate <- function(x) qassert(x, "N*(0,]") vector_checkmate(vec)

`assertive`

has a custom function for the job, with a particularly helpful
error message:

vector_assertive <- function(x) assert_all_are_positive(x) vector_assertive(vec)

If you wish to build re-usable complex checks with `stopifnot`

, `asserthat`

,
`assertive`

, and `checkmate`

you do so by writing new functions. `vetr`

implements a special type of programmable Non Standard Evaluation. Here
we write a Vetting Expression that accepts either a
square numeric matrix, or a scalar numeric:

sqr.mx <- quote(ncol(.) == nrow(.)) num.mx <- matrix(numeric(), 0, 0) # 0 x 0 matrix, matches any matrix sqr.num.mx <- quote(sqr.mx && num.mx) sqr.num.mx.or.sclr.num <- quote(sqr.num.mx || numeric(1L)) compound_vetr <- function(x) vetr(sqr.num.mx.or.sclr.num) rect.mx <- matrix(1:12, 3) compound_vetr(rect.mx)

`vetr`

recursively substitutes symbols in the Vetting
Expression which makes it very easy to assemble complex
expressions from simple ones by using `quote`

. Note that standalone templates
like `matrix(numeric(), 0, 0)`

need not be quoted.

We benchmark the functions with `mb`

, a thin wrapper around `microbenchmark`

(see the appendix for its definition). We focus on timings for
checks that succeed.

Starting with the simple checks on `nums <- runif(2)`

:

mb <- function(...) { mb.call <- match.call() mb.call[[1]] <- quote(microbenchmark::microbenchmark) gc() mb.dat <- eval(mb.call, envir=parent.frame()) mb.res <- summary(mb.dat) mb.res <- mb.res[order(mb.res$median), ] mb.res[, -1] <- lapply( mb.res[, -1], function(x) sprintf(" %s", format(round(x, 1), big.mark=",")) ) cat( sprintf("Unit: %s, neval: %s\n\n", attr(mb.res, 'unit'), mb.res$neval[1]) ) print(mb.res[, c('expr', 'lq', 'median', 'uq', 'mean')], quote=FALSE) }

nums <- runif(2) mb( simple_stopifnot(nums), simple_vetr(nums), simple_assertive(nums), simple_assertthat(nums), simple_checkmate(nums) )

`stopifnot`

and `checkmate`

lead the way, with `vetr`

not too far behind.

For complex objects `vetr`

takes the lead:

mb( complex_assertive(iris), complex_assertthat(iris), complex_checkmate(iris), complex_stopifnot(iris), complex_vetr(iris) )

`vetr`

is the fastest option for checking that values in a long vector are in
range:

str.pos.vec <- runif(5e5) + 1 # test with a 500K long vector

mb( vector_assertthat(str.pos.vec), vector_checkmate(str.pos.vec), vector_stopifnot(str.pos.vec), vector_vetr(str.pos.vec) )

This is primarily because we use `vetr::all_bw`

instead of the semantically
similar `isTRUE(all(. > 0))`

expression. `all_bw`

is implemented in C
and avoids the intermediate vectors required to evaluate the standard R version.
`checkmate`

does the same with `qassert`

.

`assertive`

is substantially slower so we benchmark it separately:

```
mb(times=5,
vector_assertive(str.pos.vec)
)
```

If your functions will never be run thousands of times then you probably do not need to worry about the differences shown here. However, general purpose parameter check functions should be compatible with functions that are, and in those cases microseconds matter.

`vetr`

FasterWe made a design choice with `vetr`

that the overhead associated with running
two `match.call`

calls was worth the features it allowed us to implement. In
some cases even those ~10 microseconds are too much. For those you can use
`vet`

which is a general purpose object checker that uses Vetting
Expressions just like `vetr`

, or go even further and call
`vetr::alike`

directly to do template comparisons:

simple_vet <- function(x) vet(numeric(2L) && !anyNA(.), x, stop=TRUE) simple_vet(pi) simple_alike <- function(x) { if(!isTRUE(msg <- alike(numeric(2L), x))) stop("Argument `x` invalid: ", msg) if(anyNA(x)) stop("Argument `x` contains NAs") } simple_alike(pi)

The error messages / ease of use degrade, but we do improve our timings (again,
with `nums <- runif(2)`

):

```
mb(times=10000,
simple_vet(nums),
simple_vetr(nums),
simple_checkmate(nums),
simple_alike(nums)
)
```

Take these with a grain of salt, as they are written by the `vetr`

author:

In favor of `vetr`

:

- Best package for checking complex S3 structure.
- Fastest package for non-trivial checks.

Against `vetr`

:

- New package (beta testers welcome!).
- Template concept is designed to be intuitive, but inevitably in some corner cases it will require deeper understanding of the underlying rules.

If you are tired of dealing with checks for non-trivial S3 objects and are
willing to try out a young package, `vetr`

is for you. If not, we would
recommend^{2} `checkmate`

as it is fast, well
established, and more expressive than the other options.

Definitions of terms as we use them in this document. They may have different definitions elsewhere.

- Check Function
- Similar to a [Predicate Function](#predicate-function), may return either TRUE or a vector of only TRUE values on success, and something else on failure. `all.equal(x, y)` is a common example. Check Functions should be used within `isTRUE(all(check_fun(...)))` to establish success or failure, unless they are used within `stopifnot` or `vetr` where such a check is implicit.
- Check Expression
- A collection of calls to [Check](#check-function) and/or [Predicate Functions](#predicate-function) combined with logical operators. For example `!anyNA(x) && x > 0`.
- Non Standard Evaluation
- Refers to the evaluation of an R expression different than would normally occur, because the expression is modified prior to evaluation, it is evaluated in a different environment than it would normally be, or both. A classic example is `subset(x, subset)` where the `subset` argument is evaluated within `x` instead of in the parent frame. `vetr` implements a special type of Non Standard Evaluation that recursively substitutes all non-function symbols (i.e. symbols not at position 1 in a call) that resolve to symbols until the resulting expression only contains symbols that point to non-symbol R objects. It also substitutes `.` with the corresponding function parameter. Finally, it tokenizes the expression by breaking it apart into by `&&` and `||`, and evaluates each token separately.
- Predicate Function
- Function that typically accepts one non-optional argument and returns TRUE or FALSE according to whether that argument conforms to a requirement. `isTRUE(x)` and `is.numeric(x)` are single argument examples, and `identical(x, y)` is a two argument example.
- Standard Token
- Part of a [Vetting Expression](#vetting-expression) that should be treated as a standard R expression. If `isTRUE(all(evaluated_standard_token))` then the token will be considered to pass. [Standard Tokens](#standard-token) are identified by the presence of a `.` symbol within a token. You can also wrap expressions in `.()` to mark them as standard tokens. If you need to use the `.` symbol for other purposes escape it with another `.` (i.e. to use a literal `.`, use `..`). See `vignette('vetr', 'vetr')` for more details.
- Template Token
- Part of a [Vetting Expression](#vetting-expression) that should be treated as a template by `vetr`. It should resolve to an R object when it is evaluated in the calling frame of the enclosing function to the `vetr` call. See `vignette('vetr', 'vetr')` for more details.
- Vetting Expression
- Like a [Check Expression](#check-expression), but augmented for use in `vetr`. In addition to Check and [Predicate Function](#predicate-function) calls, it can include [Template Tokens](#templated-token). References to the object being checked should be made with the `.` symbol. For example `numeric(1L) && !is.na(.) && . > 0`, is made up of three tokens, where `numeric(1L)` is a [Template Token](#templated-token), and `!is.na(.)` and `. > 0` are [Standard Tokens](#standard-token). Beware of accidentally referencing a variable in a token intended to be a [Standard Token](#standard-token). If we had used `x > 0` instead of `. > 0`, `vetr` would evaluate `x > 0` and use the result as a template since `x > 0` does not contain the `.` symbol to mark it as a [Standard Token](#standard-token). See `vignette('vetr', 'vetr')` for more details.
- Vetting Token
- Component piece of [Vetting Expressions](#vetting-expression). May be either [Standard Tokens](#standard-token) or [Template Tokens](#templated-token). [Vetting Tokens](#vetting-token) are the arguments to "top level" `&&`, `||`, and `(` calls. For example, in the [Vetting Expression](#vetting-expression) `!anyNA(.) && I(. > 0 && interactive())` there are two [Vetting Tokens](#vetting-token): `!anyNA(.)` and `I(. > 0 && interactive())`. The `&&` inside `I(...)` is not considered to be at the "top level" because of the intervening `I` call.

- Help, vignettes, and READMEs for
`assertthat`

,`assertive`

,`checkmate`

,`ensurer`

, and`valaddin`

. - checkmate: Fast Argument Checks for Defensive Programming in R, Michel Lang, The R Journal (2017) 9:1, pages 437-445.
- The State of Assertions in R, Richie Cotton, July 2015
- Testing R Code, Richie Cotton, January 2017

We use a thin wrapper around `microbenchmark`

:

These are the implementations we omitted from the iris checks in the details section.

complex_assertive(iris.fake) complex_assertthat(iris.fake)

sessionInfo()

^{1} `assertthat`

requires Check
Expressions to evaluate to TRUE or FALSE, whereas
`stopifnot`

accepts anything, with all TRUE vectors considered a success and
all else failure. Additionally, `assertthat`

implements a handful of useful
Predicate Functions for common checks (e.g. scalars,
etc.).

^{2} Our recommendation for `checkmate`

is made
solely on the basis of the tests described in this document. We have not used
it in any of our packages.

brodieG/vetr documentation built on Aug. 19, 2018, 12:35 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.