README.md

exemplar

R-CMD-check Last commit Codecov test coverage license

Exemplar generates dependency-free validation functions to make sure one object looks like another (its exemplar). The package contains only one function, exemplar. Consider the validation function generated by using mtcars$wt (vehicle weight, in 1000 lbs) as an exemplar:

exemplar(mtcars$wt)

This will print the below function which can be modified and used to make sure that any new data meets the same conditions as mtcars$wt:

validate_mtcars_wt <- function(data) {
  stopifnot(exprs = {
    is.double(data)
    !any(is.na(data) | is.null(data))
    # Duplicate values were detected so this assertion has been disabled:
    # !any(duplicated(data))
    min(data, na.rm = TRUE) > 0 # all positive
    # Uncomment or modify the below range assertions if needed:
    # max(data, na.rm = TRUE) <= 5.424
    # 1.513 <= min(data, na.rm = TRUE)
    # Uncomment or modify the below deviance from mean assertions if needed.
    # The mean is 3.22 and the standard deviation is 0.98:
    # max(data, na.rm = TRUE) <= 3.22 + 4 * 0.98
    # 3.22 - 4 * 0.98 <= min(data, na.rm = TRUE)
  })
  invisible(TRUE)
}

The generated validation function, validate_mtcars_wt checks that:

If all conditions are met, the function will invisibly return TRUE. Otherwise, it will error. The function can be defined with eval and parse:

eval(parse(text = exemplar(mtcars$wt)))
validate_mtcars_wt(c(mtcars$wt, NA))
Error in validate_mtcars_wt(c(mtcars$wt, NA)) : 
  !any(is.na(data) | is.null(data)) is not TRUE

Some checks are commented out. This is because the exemplar does not meet the criteria (eg. no duplicate values) or the checks are too specific to be used by default (range checks). The intention is that users will modify the validation functions to meet their needs before placing them in pipelines and scripts.

A common use case might be machine learning with a train/test data split. A validation function can be generated using the training data as the exemplar, and then applied to the test data.

Entire data frames can be used as a exemplars. Additionally, exemplar supports tidyselect selectors, which limits the validation functions to certain columns. The following will all work:

exemplar(mtcars) # will validate all columns
exemplar(mtcars, wt, mpg)
exemplar(mtcars, -cyl)
exemplar(mtcars, starts_with("d"))

The functions produced by exemplar require at least R 3.5 (due to improvements made to stopifnot) but otherwise requires no dependencies. That is, exemplar generates functions that do not need exemplar or any other packages to run.

Installation

You can install the development version of exemplar from GitHub with:

# install.packages("devtools")
devtools::install_github("mdneuzerling/exemplar")


mdneuzerling/exemplar documentation built on Jan. 13, 2024, 1:46 a.m.