knitr::opts_chunk$set(error=TRUE, comment=NA) suppressWarnings({ library(vetr) library(assertive) library(assertthat) library(checkmate) })
Systematically vetting function parameters is as tedious as it is important. Fortunately many functions and packages exist to assist with the process. We will review:
We compare several packages to stopifnot
for the task of checking function
parameters. We ignore all other features. The following table summarizes the
strengths of each package:
assertthat | assertive | checkmate | vetr | |
---|---|---|---|---|
Simple | ✓ | |||
Concise | ✓ | ✓ | ||
Informative | ✓ | ✓ | ✓ | ✓ |
Fast | ✓ | ✓ |
And in our opinion what each package does best in the context of function parameter checking:
assertthat
: no learning curve, better error messages.assertive
: excellent error messages.checkmate
: powerful semantics for checking vectors, fast.vetr
: powerful semantics for checking complex object structure, fast.More details on the categories in the summary table:
Ideally packages should have a minimal learning curve.
asserthat
hews closely to stopifnot
semantics so it is trivial to learn.assertive
, checkmate
require familiarity with their many Predicate
Functions.vetr
requires understanding how to build the templates used in structure
checks.The benefit to increased complexity is the ability to express complex requirements succinctly:
checkmate
provides highly configurable functions, and a domain specific
language for vector checks.vetr
infers all required structural checks from a template object.assertive
is focused on single parameter Predicate
Functions, which leads to more verbose checks.assertthat
relies primarily1 on standard R
expressions so is equivalent to stopifnot
in this respect.One drawback for stopifnot
is that the error messages it produces are
sometimes cryptic. All the reviewed packages seek to improve on this:
assertthat
has clearer error messages for the base is.*
, any
, all
, and
the handful of bundled Predicate Functions, although no
additional information is provided except for any
and all
(position of
failure).assertive
, and checkmate
provide clearer error messages with additional
information for the included Predicate Functions,
particularly for assertive
.vetr
provide clearer error messages with additional information for object
structure (type, length, attributes), and for values with all_bw
.All packages allow users to attach custom error messages to Check Expressions or Predicate Functions, but here we focus on those available "out of the box".
Parameter checks should add minimal overhead.
stopifnot
is as fast as the R expressions you use for it.assertthat
is comparable but adds some overhead.assertive
is substantially slower.checkmate
is fastest for simple checks, and faster for vectors.vetr
is fastest for long vectors and complex objects.We base our conclusions on the tests we ran. It is entirely possible different tests could lead to different results. See benchmarks.
We want to write a functions that accept a length two numeric vector with no
missing values. To illustrate we use functions that enforce that requirement
and do nothing else, starting with stopifnot
:
simple_stopifnot <- function(x) stopifnot(is.numeric(x), length(x) == 2, !is.na(x))
vetr
looks similar on the surface:
simple_vetr_a <- function(x) vetr(is.numeric(.) && length(.) == 2 && !is.na(.))
The vetr
arguments are matched to those of the enclosing function. As a
result we reference x
with .
, and we must use &&
instead of ,
to
delimit our checks.
Additionally, vetr
introduces templates, so we can rewrite simple_vetr_a
as:
simple_vetr <- function(x) vetr(numeric(2L) && !is.na(.))
numeric(2L) && !is.na(.)
is a Vetting Expression that
contains the Template Token numeric(2L)
, and the Standard
Token !is.na(.)
. Template Tokens
require that parameters match their structure (i.e. length, type, and
attributes). Standard Tokens, marked by the presence of the
.
symbol, are evaluated as they would be by stopifnot
.
Templates should be familiar to vapply
users, but there are some wrinkles.
For example zero-length templates like numeric()
match any length objects.
See ?vetr::alike
and vignette('alike', 'vetr')
for details.
asserthat
is like stopifnot
:
simple_assertthat <- function(x) assert_that(is.numeric(x), length(x) == 2, !anyNA(x))
assertive
and checkmate
rely on the Predicate
Functions and the accompanying assertions they implement:
# assertive: 200+ simple Predicate Functions simple_assertive <- function(x) { assert_is_numeric(x) assert_is_of_length(x, 2) assert_all_are_not_na(x) } # checkmate: 40+ flexible/complex Predicate Functions simple_checkmate <- function(x) assertNumeric(x, any.missing=FALSE, len=2)
For this type of check the improvements from the third party packages seem marginal until we look at the result with an illegal parameter:
simple_stopifnot(pi) simple_vetr(pi)
In addition to what our parameter is not, vetr
tells you what it is, gives you
the original call of the function, and gives you the input as it appears
in the calling frame (i.e. length(pi)
instead of length(x)
).
assertthat
makes the error message friendlier, but does not add information:
simple_assertthat(pi)
The other packages improve on the error message, in particular by telling you what the object is in addition to what it is not:
simple_assertive(pi) simple_checkmate(pi)
Here we wish to verify that an input conforms to the structure of the iris
built-in data set.
# make a bad version of iris iris.fake <- iris levels(iris.fake$Species)[3] <- "sibirica" # tweak levels
Then, with stopifnot
:
iris.col.classes <- lapply(iris, class) complex_stopifnot <- function(x) { stopifnot( is.data.frame(x), # this only checks class is.list(x), length(x) == length(iris), identical(lapply(x, class), iris.col.classes), is.integer(attr(x, 'row.names')), identical(names(x), names(iris)), identical(typeof(x$Species), "integer"), identical(levels(x$Species), levels(iris$Species)) ) } complex_stopifnot(iris.fake)
While some of these checks may seem over-the-top, R's informality with respect to S3 classes make them necessary. For example, nothing guarantees that an object with class "data.frame" has type "list" as it should.
vetr
carries out all those checks and more by inferring them from a template:
# zero row DF contains structure info only, and matches any # of rows iris.template <- iris[0,] complex_vetr <- function(x) vetr(iris.template) complex_vetr(iris.fake)
vetr
recursively traverses the template and the function parameter in parallel
and checks each sub element of the latter against the former. The error
messages are also better. Notice how you can copy all of or part of
levels(iris.fake$Species)[3]
from the message into the R prompt for further
examination.
checkmate
is reasonably succinct:
complex_checkmate <- function(x) { assertDataFrame(x, types=unlist(iris.col.classes), ncols=5) assertTRUE(is.list(x)) assertInteger(attr(x, 'row.names')) assertNames(names(x), identical.to=names(iris)) assertFactor(x$Species, levels=levels(iris$Species)) } complex_checkmate(iris.fake)
assertive
, and assertthat
end up with the same number of explicit checks as
stopifnot
, and with similar error messages so we omit them here. See the code
appendix for those implementations.
complex_assertive <- function(x) { assert_is_list(x) assert_all_are_equal_to(length(x), 5) assert_is_integer(attr(x, 'row.names')) assert_is_data.frame(x) assert_is_identical_to_true(identical(iris.col.classes, lapply(x, class))) assert_is_identical_to_true(identical(names(x), names(iris))) assert_is_identical_to_true(identical(typeof(x$Species), "integer")) assert_is_factor(x$Species) assert_is_identical_to_true( identical(levels(x$Species), levels(iris$Species)) ) } complex_assertthat <- function(x) { assert_that( is.data.frame(x), # this only checks class is.list(x), length(x) == length(iris), identical(lapply(x, class), iris.col.classes), is.integer(attr(x, 'row.names')), identical(names(x), names(iris)), identical(typeof(x$Species), "integer"), identical(levels(x$Species), levels(iris$Species)) ) }
Suppose we wish to ensure our input is a strictly positive numeric vector with
no missing values. With vec <- -1:1
and stopifnot
we would use:
vec <- -1:1
vector_stopifnot <- function(x) stopifnot(is.numeric(x), !anyNA(x), all(x > 0)) vector_stopifnot(vec)
vetr
implements the all_bw
function primarily for speed, but it also
generates more useful error messages:
vector_vetr <- function(x) vetr(numeric() && all_bw(., lo=0, bounds="(]")) vector_vetr(vec)
asserthat
is like stopifnot
, with a better error message:
vector_assertthat <- function(x) assert_that(is.numeric(x), !anyNA(x), all(x > 0)) vector_assertthat(vec)
checkmate
implements a powerful notation for checking vectors:
vector_checkmate <- function(x) qassert(x, "N*(0,]") vector_checkmate(vec)
assertive
has a custom function for the job, with a particularly helpful
error message:
vector_assertive <- function(x) assert_all_are_positive(x) vector_assertive(vec)
If you wish to build re-usable complex checks with stopifnot
, asserthat
,
assertive
, and checkmate
you do so by writing new functions. vetr
implements a special type of programmable Non Standard Evaluation. Here
we write a Vetting Expression that accepts either a
square numeric matrix, or a scalar numeric:
sqr.mx <- quote(ncol(.) == nrow(.)) num.mx <- matrix(numeric(), 0, 0) # 0 x 0 matrix, matches any matrix sqr.num.mx <- quote(sqr.mx && num.mx) sqr.num.mx.or.sclr.num <- quote(sqr.num.mx || numeric(1L)) compound_vetr <- function(x) vetr(sqr.num.mx.or.sclr.num) rect.mx <- matrix(1:12, 3) compound_vetr(rect.mx)
vetr
recursively substitutes symbols in the Vetting
Expression which makes it very easy to assemble complex
expressions from simple ones by using quote
. Note that standalone templates
like matrix(numeric(), 0, 0)
need not be quoted.
We benchmark the functions with mb
, a thin wrapper around microbenchmark
(see the appendix for its definition). We focus on timings for
checks that succeed.
Starting with the simple checks on nums <- runif(2)
:
mb <- function(...) { mb.call <- match.call() mb.call[[1]] <- quote(microbenchmark::microbenchmark) gc() mb.dat <- eval(mb.call, envir=parent.frame()) mb.res <- summary(mb.dat) mb.res <- mb.res[order(mb.res$median), ] mb.res[, -1] <- lapply( mb.res[, -1], function(x) sprintf(" %s", format(round(x, 1), big.mark=",")) ) cat( sprintf("Unit: %s, neval: %s\n\n", attr(mb.res, 'unit'), mb.res$neval[1]) ) print(mb.res[, c('expr', 'lq', 'median', 'uq', 'mean')], quote=FALSE) }
nums <- runif(2) mb( simple_stopifnot(nums), simple_vetr(nums), simple_assertive(nums), simple_assertthat(nums), simple_checkmate(nums) )
stopifnot
and checkmate
lead the way, with vetr
not too far behind.
For complex objects vetr
takes the lead:
mb( complex_assertive(iris), complex_assertthat(iris), complex_checkmate(iris), complex_stopifnot(iris), complex_vetr(iris) )
vetr
is the fastest option for checking that values in a long vector are in
range:
str.pos.vec <- runif(5e5) + 1 # test with a 500K long vector
mb( vector_assertthat(str.pos.vec), vector_checkmate(str.pos.vec), vector_stopifnot(str.pos.vec), vector_vetr(str.pos.vec) )
This is primarily because we use vetr::all_bw
instead of the semantically
similar isTRUE(all(. > 0))
expression. all_bw
is implemented in C
and avoids the intermediate vectors required to evaluate the standard R version.
checkmate
does the same with qassert
.
assertive
is substantially slower so we benchmark it separately:
mb(times=5, vector_assertive(str.pos.vec) )
If your functions will never be run thousands of times then you probably do not need to worry about the differences shown here. However, general purpose parameter check functions should be compatible with functions that are, and in those cases microseconds matter.
vetr
FasterWe made a design choice with vetr
that the overhead associated with running
two match.call
calls was worth the features it allowed us to implement. In
some cases even those ~10 microseconds are too much. For those you can use
vet
which is a general purpose object checker that uses Vetting
Expressions just like vetr
, or go even further and call
vetr::alike
directly to do template comparisons:
simple_vet <- function(x) vet(numeric(2L) && !anyNA(.), x, stop=TRUE) simple_vet(pi) simple_alike <- function(x) { if(!isTRUE(msg <- alike(numeric(2L), x))) stop("Argument `x` invalid: ", msg) if(anyNA(x)) stop("Argument `x` contains NAs") } simple_alike(pi)
The error messages / ease of use degrade, but we do improve our timings (again,
with nums <- runif(2)
):
mb(times=10000, simple_vet(nums), simple_vetr(nums), simple_checkmate(nums), simple_alike(nums) )
Take these with a grain of salt, as they are written by the vetr
author:
In favor of vetr
:
Against vetr
:
If you are tired of dealing with checks for non-trivial S3 objects and are
willing to try out a young package, vetr
is for you. If not, we would
recommend2 checkmate
as it is fast, well
established, and more expressive than the other options.
Definitions of terms as we use them in this document. They may have different definitions elsewhere.
assertthat
, assertive
, checkmate
,
ensurer
, and valaddin
.We use a thin wrapper around microbenchmark
:
These are the implementations we omitted from the iris checks in the details section.
complex_assertive(iris.fake) complex_assertthat(iris.fake)
sessionInfo()
1 assertthat
requires Check
Expressions to evaluate to TRUE or FALSE, whereas
stopifnot
accepts anything, with all TRUE vectors considered a success and
all else failure. Additionally, assertthat
implements a handful of useful
Predicate Functions for common checks (e.g. scalars,
etc.).
2 Our recommendation for checkmate
is made
solely on the basis of the tests described in this document. We have not used
it in any of our packages.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.