In tkonopka/consonance: Consonance Testing

library(validate)
library(checkmate)
suppressMessages(library(assertr))
suppressMessages(library(Rcssplot))
source("plot_consonance.R")

Introduction

The consonance package provides a framework for performing quality control on datasets and models. This vignette compares the performance (running time) of consonance tests to other approaches for testing data.

As a word of caution, the comparisons below include approaches implemented with base R (only) and some approaches using external packages checkmate, assertthat, assertr, and validate. However, the comparisons cannot be considered exhaustive or complete. Moreover, measurements are performed on a single dataset and cannot be taken as representative of all use cases.

Performance {#performance}

Code in this vignette relies on the following packages.

library(magrittr)
library(assertthat)
library(checkmate)
library(assertr)
library(consonance)
library(microbenchmark)

Dataset

For practical measurements of performance, let's look at the anscombe dataset, which was also used in the primary vignette. Let's prepare it into a single data frame with three columns - one (character) identifier column and two (numeric) coordinate columns.

anscombe_good <- rbind(data.frame(id = "A", x = anscombe$x1, y = anscombe$y1),
                       data.frame(id = "B", x = anscombe$x2, y = anscombe$y2),
                       data.frame(id = "C", x = anscombe$x3, y = anscombe$y3),
                       data.frame(id = "D", x = anscombe$x4, y = anscombe$y4))
anscombe_good$id <- as.character(anscombe_good$id)
rownames(anscombe_good) <- NULL
head(anscombe_good, 2)

The range for the x-coordinate is a useful quantity for quality control (see primary vignette). It will be helpful to save minimum and maximum values.

x_min <- min(anscombe_good$x)
x_max <- max(anscombe_good$x)

We will also need a corrupted dataset. For this, we can set one of the coordinates outside the intended range.

anscombe_bad <- anscombe_good
anscombe_bad$x[2] <- x_max + 10
head(anscombe_bad, 2)

Pipelines

Let's create several pipelines for dataset validation. The requirements will be that the id column is composed of at least one character and that the x-coordinates are within the expected range.

Using the consonance package, we can first define a suite of tests, and then define a pipeline that applies the suite on data.

suite_consonance <-
  consonance_test("id single char", test_character,
                  min.chars=1, .var="id") +
  consonance_test("x in bounds", test_numeric,
                  lower=x_min, upper=x_max, .var="x")
use_consonance <- function(d) {
  d %>% validate(suite_consonance)
}

Using base R, we can define two implementations: one that does not perform any tests at all, and one that uses a series of stopifnot conditions.

use_none <- function(d) invisible(d)
use_base_R <- function(d) {
  stopifnot(is.character(d$id))
  stopifnot(all(nchar(d$id) > 0))
  stopifnot(is.numeric(d$x))
  xrange <- range(d$x)
  stopifnot(xrange[1] >= x_min & xrange[2] <= x_max)
  invisible(d)
}

Using the assertthat package, the implementation is similar to base R, but replaces stopifnot by package functions.

use_assertthat <- function(d) {
  assert_that(is.character(id), env=d)
  assert_that(all(nchar(id) > 0), env=d)
  assert_that(is.numeric(x), env=d)
  assert_that(min(x) >= x_min & max(x) <= x_max, env=d)
  invisible(d)
}

Using the checkmate package, assertions on one variable can include several criteria at once.

use_checkmate <- function(d) {
  assert_character(d$id, min.chars=1)
  assert_numeric(d$x, lower=x_min, upper=x_max)
  invisible(d)
}

Using the assertr package, we can define a chain of tests (which is similar to a suite) and invoke the chain on data.

suite_assertr <- . %>%
  chain_start %>%
  verify(is.character(id)) %>%
  verify(nchar(id) > 0) %>%
  verify(is.numeric(x)) %>%
  assert(within_bounds(x_min, x_max), x) %>%
  chain_end
use_assertr <- function(d) {
  d %>% suite_assertr %>% invisible
}

Using the validate package, we can define a custom validator (which is similar to a suite) and then confront the data with the validator.

suite_validate <- validator(is.character(id), nchar(id)>0,
                            is.numeric(x), x>=x_min & x<=x_max)
use_validate <- function(d) {
  result <- confront(d, suite_validate)
  stopifnot(all(result))
  invisible(d)
}

Note: These are first-attempt implementations and they are not exactly equivalent. Some of the distinctions and their effect on performance are mentioned in the discussion at the end.

Outputs

Before measuring run times, let's run the pipelines on data to make sure they produce the expected outputs. The good dataset should lead to quiet execution.

use_consonance(anscombe_good)
use_none(anscombe_good)
use_base_R(anscombe_good)
use_assertthat(anscombe_good)
use_checkmate(anscombe_good)
use_assertr(anscombe_good)
use_validate(anscombe_good)

The bad dataset should lead to messages.

# using consonance package
use_consonance(anscombe_bad)
# using alternative methods
use_base_R(anscombe_bad)
use_assertthat(anscombe_bad)
use_checkmate(anscombe_bad)
use_assertr(anscombe_bad)
use_validate(anscombe_bad)

Measurements

Let's measure the time necessary to perform the tests, starting with the good dataset.

runtimes_good <- microbenchmark(
  none = try({ use_none(anscombe_good) }),
  base_R = try({ use_base_R(anscombe_good) }),
  assertthat = try({ use_assertthat(anscombe_good) }),
  checkmate = try({ use_checkmate(anscombe_good) }),
  consonance = try({ use_consonance(anscombe_good) }),
  assertr = try({ use_assertr(anscombe_good) }),
  validate = try({ use_validate(anscombe_good )}),
  times = 500
)

Next, we measure using the dataset with a faulty data item.

runtimes_bad <- microbenchmark(
  none = try({ use_none(anscombe_bad) }),
  base_R = try({ use_base_R(anscombe_bad) }),
  assertthat = try({ use_assertthat(anscombe_bad) }),
  checkmate = try({ use_checkmate(anscombe_bad) }),
  consonance = try({ use_consonance(anscombe_bad) }),
  assertr = try({ use_assertr(anscombe_bad) }),
  validate = try({ use_validate(anscombe_bad) }),
  times = 500
)

The measurements are available with fine detail, but the main properties are visible from median running times.

# prep - get a relevant range for the plot x-axis
all.times <- c(split(runtimes_good$time, runtimes_good$expr),
               split(runtimes_bad$time, runtimes_bad$expr))
times.range <- c(0, max(sapply(all.times, quantile, p=0.5)))
# draw the two sets of results
oldpar <- par()
par(mfrow=c(1,2))
plot_runtimes(runtimes_good, xlim=times.range, corner="A",
              main="Good data - all criteria pass")
plot_runtimes(runtimes_bad, xlim=times.range, corner="B",
              main="Bad data - one criterion fails")
newpar <- par(oldpar)

The bars show time, so lower is better; both charts have the same scale on the x-axis. The results on the left show that processing a non-problematic dataset is fast compared to detecting errors (right). Both panels show that the infrastructure provided by consonance slows it down compared to base R, but performance seems competitive compared to assertr or validate.

Measurements with skipped tests

As a final exploration, let's consider performance in situations when the data is expected to be consonant. This can arise when the data has already been checked and repeat-testing is redundant.

Implementing test-skipping in base R involves wrapping the tests inside an 'if' condition. The consonance package implements a dedicated argument so this feature does not require new lines of code.

use_base_R_skip <- function(d, skip=FALSE) {
  if (!skip) use_base_R(d)
  invisible(d)
}
use_consonance_skip <- function(d, skip=FALSE, skip.action="log") {
  # note this avoids the pipe operator.
  # This makes the function more comparable to base R, but not comparable
  # to the original use_consonance in the previous section.
  validate(d, suite_consonance, skip=skip, skip.action=skip.action)
}

We can now collect measurements (only on the good dataset because we are assuming that test-skipping is justified).

runtimes_skip <- microbenchmark(
  none = use_none(anscombe_good),
  base_R = use_base_R_skip(anscombe_good, skip=FALSE),
  base_R_skip_quiet = use_base_R_skip(anscombe_good, skip=TRUE),
  consonance = use_consonance_skip(anscombe_good, skip=FALSE),
  consonance_skip_log = use_consonance_skip(anscombe_good, skip=TRUE,
                                            skip.action="log"),
  consonance_skip_quiet = use_consonance_skip(anscombe_good, skip=TRUE,
                                              skip.action="none"),
  times = 500
)

skip.times <- split(runtimes_skip$time, runtimes_skip$expr)
skip.range <- c(0, max(sapply(skip.times, quantile, p=0.5)))
oldpar <- par()
par(mfrow=c(1,2))
plot_runtimes(runtimes_skip, xlim=skip.range, corner="",
              main="Test skipping", Rcssclass="longnames")
newpar <- par(oldpar)

Note that for larger datasets, the running times for all approaches that skip tests would remain constant and the running times for the approaches that perform tests ('base R' and 'consonance') would increase.

Discussion {#discussion}

In the context of data quality control, run-time performance sits in an uncomfortable position. On the one hand, performance should not be a primary concern because a limited amount of time spent on quality control can prevent time-consuming debugging. On the other hand, if checks impose a noticeable slowdown, that can discourage adoption. The presented results show that the consonance package introduces some overhead compared to base R, but nonetheless executes faster than other frameworks.

The presented measurements are guidelines only, and there are important caveats to interpreting the benchmark data. Besides the well-known difficulties in assessing the running time of code and the limited range of the benchmark conditions, it is important to emphasize that the tested pipelines actually provide distinct features.

First, implementations labeled as 'base R', 'assertthat', and 'checkmate' apply criteria in a sequential manner and abort at the first error. In contrast, implementations 'consonance', 'assertr', and 'validate' assess all criteria and report a summary of all failures. This bookkeeping is not a major factor on running time in the above examples because the failure in the bad dataset occurs in the last-defined criterion. But the bookkeeping explains why more complex methods require longer to execute. In practical situations, reporting all problems at once can be a time-saver in the long-run.

Second, the implementations differ in the flexibility and the detail provided in the log or failure messages. In this regard, assertr and validate provide a full report of individual data records that do not meet the criteria. Thus, although they are slower, they provide the most detailed output.

Overall, the consonance package offers a middle ground in terms of performance between simple testing and full-features frameworks such as assertr and validate.