library(valaddin) knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
valaddin is a lightweight R package that enables you to transform an existing
function into a function with input validation checks. It does so without
requiring you to modify the body of the function, in contrast to doing input
validation using stop
or stopifnot
, and is therefore suitable for both
programmatic and interactive use.
This document illustrates the use of valaddin, by example. For usage details,
see the main documentation page, ?firmly
.
The workhorse of valaddin is the function firmly
, which applies input
validation to a function, in situ. It can be used to:
For example, to require that all arguments of the function
f <- function(x, h) (sin(x + h) - sin(x)) / h
are numerical, apply firmly
with the check formula ~is.numeric
[^1]:
ff <- firmly(f, ~is.numeric)
ff
behaves just like f
, but with a constraint on the type of its arguments:
ff(0.0, 0.1) ff("0.0", 0.1)
[^1]: The inspiration to use ~
as a quoting operator came from the vignette
Non-standard
evaluation,
by Hadley Wickham.
For example, use firmly
to put a cap on potentially long-running computations:
fib <- function(n) { if (n <= 1L) return(1L) Recall(n - 1L) + Recall(n - 2L) } capped_fib <- firmly(fib, list("n capped at 30" ~ ceiling(n)) ~ {. <= 30L}) capped_fib(10) capped_fib(50)
The role of each part of the value-constraining formula is evident:
The right-hand side {. <= 30L}
is the constraint itself, expressed as a
condition on .
, a placeholder argument.
The left-hand side list("n capped at 30" ~ ceiling(n))
specifies the
expression for the placeholder, namely ceiling(n)
, along with a message to
produce if the constraint is violated.
If the default behavior of a function is problematic, or unexpected, you can use
firmly
to warn you. Consider the function as.POSIXct
, which creates a
date-time object:
tz_original <- Sys.getenv("TZ", unset = NA)
Sys.setenv(TZ = "CET") (d <- as.POSIXct("2017-01-01 09:30:00"))
The problem is that d
is a potentially ambiguous object (with hidden state),
because it's not assigned a time zone, explicitly. If you compute the local hour
of d
using as.POSIXlt
, you get an answer that interprets d
according to
your current time zone; another user—or you, in another country, in the
future—may get a different result.
If you're in CET time zone:
r
as.POSIXlt(d, tz = "EST")$hour
If you were to change to EST time zone and rerun the code:
r
Sys.setenv(TZ = "EST")
d <- as.POSIXct("2017-01-01 09:30:00")
as.POSIXlt(d, tz = "EST")$hour
if (isTRUE(is.na(tz_original))) { Sys.unsetenv("TZ") } else { Sys.setenv(TZ = tz_original) }
To warn yourself about this pitfall, you can modify as.POSIXct
to complain
when you've forgotten to specify a time zone:
as.POSIXct <- firmly(as.POSIXct, .warn_missing = "tz")
Now when you call as.POSIXct
, you get a cautionary reminder:
as.POSIXct("2017-01-01 09:30:00") as.POSIXct("2017-01-01 09:30:00", tz = "CET")
NB: The missing-argument warning is implemented by wrapping functions. The
underlying function base::as.POSIXct
is called unmodified.
loosely
to access the original functionThough reassigning as.POSIXct
may seem risky, it is not, for the behavior is
unchanged (aside from the extra precaution), and the original as.POSIXct
remains accessible:
base::as.POSIXct
loosely
to strip input validation: loosely(as.POSIXct)
loosely(as.POSIXct)("2017-01-01 09:30:00") identical(loosely(as.POSIXct), base::as.POSIXct)
R tries to help you express your ideas as concisely as possible. Suppose you
want to truncate negative values of a vector w
:
w <- {set.seed(1); rnorm(5)} ifelse(w > 0, w, 0)
ifelse
assumes (correctly) that you intend the 0
to be repeated r length(w)
times, and does that for you, automatically.
Nonetheless, R's good intentions have a darker side:
z <- rep(1, 6) pos <- 1:5 neg <- -6:-1 ifelse(z > 0, pos, neg)
This smells like a coding error. Instead of complaining that pos
is too short,
ifelse
recycles it to line it up with z
. The result is probably not what you
wanted.
In this case, you don't need a helping hand, but rather a firm one:
chk_length_type <- list( "'yes', 'no' differ in length" ~ length(yes) == length(no), "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no) ) ~ isTRUE ifelse_f <- firmly(ifelse, chk_length_type)
ifelse_f
is more pedantic than ifelse
. But it also spares you the
consequences of invalid inputs:
ifelse_f(w > 0, w, 0) ifelse_f(w > 0, w, rep(0, length(w))) ifelse(z > 0, pos, neg) ifelse_f(z > 0, pos, neg) ifelse(z > 0, as.character(pos), neg) ifelse_f(z > 0, as.character(pos), neg)
When R make a function call, say, f(a)
, the value of the argument a
is not
materialized in the body of f
until it is actually needed. Usually, you can
safely ignore this as a technicality of R's evaluation model; but in some
situations, it can be problematic if you're not mindful of it.
Consider a bank that waives fees for students. A function to make deposits might look like this[^2]:
deposit <- function(account, value) { if (is_student(account)) { account$fees <- 0 } account$balance <- account$balance + value account } is_student <- function(account) { if (isTRUE(account$is_student)) TRUE else FALSE }
Suppose Bob is an account holder, currently not in school:
bobs_acct <- list(balance = 10, fees = 3, is_student = FALSE)
If Bob were to deposit an amount to cover an future fee payment, his account balance would be updated to:
deposit(bobs_acct, bobs_acct$fees)$balance
Bob goes back to school and informs the bank, so that his fees will be waived:
bobs_acct$is_student <- TRUE
But now suppose that, somewhere in the bowels of the bank's software, the type of Bob's account object is converted from a list to an environment:
bobs_acct <- list2env(bobs_acct)
If Bob were to deposit an amount to cover an future fee payment, his account balance would now be updated to:
deposit(bobs_acct, bobs_acct$fees)$balance
Becoming a student has cost Bob money. What happened to the amount deposited?
The culprit is lazy evaluation and the modify-in-place semantics of
environments. In the call deposit(account = bobs_acct, value = bobs_acct$fee)
,
the value of the argument value
is only set when it's used, which comes after
the object fee
in the environment bobs_acct
has already been zeroed out.
To minimize such risks, forbid account
from being an environment:
err_msg <- "`acccount` should not be an environment" deposit <- firmly(deposit, list(err_msg ~ account) ~ Negate(is.environment))
This makes Bob a happy customer, and reduces the bank's liability:
bobs_acct <- list2env(list(balance = 10, fees = 3, is_student = TRUE)) deposit(bobs_acct, bobs_acct$fees)$balance deposit(as.list(bobs_acct), bobs_acct$fees)$balance
[^2]: Adapted from an example in Section 6.3 of Chambers, Extending R, CRC Press, 2016. For the sake of the example, ignore the fact that logic to handle fees does not belong in a function for deposits!
You don't mean to shoot yourself, but sometimes it happens, nonetheless:
x <- "An expensive object" save(x, file = "my-precious.rda") x <- "Oops! A bug or lapse has tarnished your expensive object" # Many computations later, you again save x, oblivious to the accident ... save(x, file = "my-precious.rda")
firmly
can safeguard you from such mishaps: implement a safety procedure
# Argument `gear` is a list with components: # fun: Function name # ns : Namespace of `fun` # chk: Formula that specify input checks hardhat <- function(gear, env = .GlobalEnv) { for (. in gear) { safe_fun <- firmly(getFromNamespace(.$fun, .$ns), .$chk) assign(.$fun, safe_fun, envir = env) } }
gather your safety gear
protection <- list( list( fun = "save", ns = "base", chk = list("Won't overwrite `file`" ~ file) ~ Negate(file.exists) ), list( fun = "load", ns = "base", chk = list("Won't load objects into current environment" ~ envir) ~ {!identical(., parent.frame(2))} ) )
then put it on
hardhat(protection)
Now save
and load
engage safety features that prevent you from inadvertently
destroying your data:
x <- "An expensive object" save(x, file = "my-precious.rda") x <- "Oops! A bug or lapse has tarnished your expensive object" #> Error: save(x, file = "my-precious.rda") #> Won't overwrite `file` save(x, file = "my-precious.rda") # Inspecting x, you notice it's changed, so you try to retrieve the original ... x #> [1] "Oops! A bug or lapse has tarnished your expensive object" load("my-precious.rda") #> Error: load(file = "my-precious.rda") #> Won't load objects into current environment # Keep calm and carry on loosely(load)("my-precious.rda") x #> [1] "An expensive object"
NB: Input validation is implemented by wrapping functions; thus, if the
arguments are valid, the underlying functions base::save
, base::load
are
called unmodified.
valaddin provides a collection of over 50 pre-made input checkers to
facilitate typical kinds of argument checks. These checkers are prefixed by
vld_
, for convenient browsing and look-up in editors and IDE's that support
name completion.
For example, to create a type-checked version of the function upper.tri
, which
returns an upper-triangular logical matrix, apply the checkers vld_matrix
,
vld_boolean
(here "boolean" is shorthand for "logical vector of length 1"):
upper_tri <- firmly(upper.tri, vld_matrix(~x), vld_boolean(~diag)) # upper.tri assumes you mean a vector to be a column matrix upper.tri(1:2) upper_tri(1:2) # But say you actually meant (1, 2) to be a diagonal matrix upper_tri(diag(1:2)) upper_tri(diag(1:2), diag = "true") upper_tri(diag(1:2), TRUE)
vld_true
Any input validation can be expressed as an assertion that "such and such must
be true"; to apply it as such, use vld_true
(or its complement, vld_false
).
For example, the above hardening of ifelse
can be redone as:
chk_length_type <- vld_true( "'yes', 'no' differ in length" ~ length(yes) == length(no), "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no) ) ifelse_f <- firmly(ifelse, chk_length_type) z <- rep(1, 6) pos <- 1:5 neg <- -6:-1 ifelse_f(z > 0, as.character(pos), neg) ifelse_f(z > 0, c(pos, 6), neg) ifelse_f(z > 0, c(pos, 6L), neg)
localize
A check formula such as ~ is.numeric
(or "Not number" ~ is.numeric
, if you
want a custom error message) imposes its condition "globally":
difference <- firmly(function(x, y) x - y, ~ is.numeric) difference(3, 1) difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
With localize
, you can concentrate a globally applied check formula to
specific expressions. The result is a reusable custom checker:
chk_numeric <- localize("Not numeric" ~ is.numeric) secant <- firmly(function(f, x, h) (f(x + h) - f(x)) / h, chk_numeric(~x, ~h)) secant(sin, 0, .1) secant(sin, "0", .1)
(In fact, chk_numeric
is equivalent to the pre-built checker vld_numeric
.)
Conversely, apply globalize
to impose your localized checker globally:
difference <- firmly(function(x, y) x - y, globalize(chk_numeric)) difference(3, 1) difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
assertive,
assertthat, and
checkmate provide extensive collections
of predicate functions that you can use in conjunction with firmly
and
localize
.
ensurer and assertr provide ways to validate function values.
argufy takes a different approach to input validation, using roxygen comments to specify checks.
ensurer provides an experimental
replacement for function
that builds functions with type-validated
arguments.
typeCheck, together with Types for R, enable the creation of functions with type-validated arguments by means of special type annotations. This approach is orthogonal to that of valaddin: whereas valaddin specifies input checks as predicate functions with scope (predicates are primary), typeCheck specifies input checks as arguments with type (arguments are primary).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.