User's guide {#guide}

This section introduces more features of the consonance package. As in the overview sections, code examples require the following packages.

library(consonance)
library(checkmate)
library(magrittr)

For concreteness, we will use the anscombe dataset available with base R.

head(anscombe, 2)

The dataset holds four pairs of (x, y) coordinates in arbitrary units. It will be convenient to arrange the data into four separate data frames.

datasets <- list(A = data.frame(id = "A", x = anscombe$x1, y = anscombe$y1),
                 B = data.frame(id = "B", x = anscombe$x2, y = anscombe$y2),
                 C = data.frame(id = "C", x = anscombe$x3, y = anscombe$y3),
                 D = data.frame(id = "D", x = anscombe$x4, y = anscombe$y4))

What makes these datasets interesting is that they have very similar (Pearson) correlations, but quite distinct patterns.

oldpar <- par()
par(mfrow=c(1,4))
plot_anscombe(datasets$A)
plot_anscombe(datasets$B)
plot_anscombe(datasets$C)
plot_anscombe(datasets$D)
newpar <- par(oldpar)

In this section, we will set up consonance suites for this data and for regression models.

Data consonance

Vectors

We saw in the overview how to check that data vectors are numeric.

suite_numeric_vec <-
  consonance_suite() +
  consonance_assert("numeric vector", assert_numeric, any.missing=FALSE)
# perform the assessment on dataset A
datasets$A$x %>% validate(suite_numeric_vec)
datasets$A$y %>% validate(suite_numeric_vec)

Data frames & data tables

When data consists of a data frame, data table, or list, test can be carried out on individual columns / components. This behavior can be triggered via argument .var.

suite_numeric_df <-
  consonance_assert("numeric x", assert_numeric, any.missing=FALSE, .var="x") +
  consonance_assert("numeric y", assert_numeric, any.missing=FALSE, .var="y")

The argument any.missing in each assertion is understood to be passed to the assertion function, assert_numeric. Argument .var begins with a . in order to avoid name-clashing with arguments that may be relevant to the assertion function. It instructs package consonance to carry out the assertion on the designated component.

Applying this suite on one of the anscombe datasets should execute quietly.

datasets$A %>% validate(suite_numeric_df)

To observe a failed test, we can apply the suite on a corrupted data frame.

temp_df <- datasets$A
temp_df$x[2] <- NA
temp_df %>% validate(suite_numeric_df)

The error messages indicate the corrupt data was detected, as intended.

Lists and other objects

The approach described above to test columns in data frames also applies to components in arbitrary objects or lists.

# a list object with miscellaneous components, including $x and $y
temp_list_A <- list(x=c(1, 2, 3), y=c(1, 2), comment="x and y numeric")
# testing temp_list_A should execute quietly
temp_list_A %>% validate(suite_numeric_df)
# another list object with a y component that is not numeric
temp_list_B <- list(x=c(1, 2), y=factor(c(1, 2)), comment="y is not numeric")
# testing temp_list_B should generate messages
temp_list_B %>% validate(suite_numeric_df)

Although temp_list_A and temp_list_B are not data frames, the suite of tests suite_numeric_df applies the criteria to components x and y just as before. But it is important to disclose that the .var argument in the test constructors only work one-level deep. It is not possible to specify deep lookups via the constructor, i.e. it is not possible to request that a test extract component x from an object, then extract a sub-component x2, and check properties of the sub-component. To implement such logic, it is necessary to define a custom assertion.

Custom assertions

We have seen that consonance assertions can be defined using assert_numeric and assert_named from the checkmate package. Indeed, checkmate provides dozens of functions that cover many often-used criteria, and all of them can be incorporated into a consonance suite. Nonetheless, there may arise situations where a ready-made assertion function is not available. In those scenarios it is possible to define a custom function.

As an example, let's suppose we want to require that a vector has at least three distinct values.

assert_n_unique <- function(x, n) {
  stopifnot(length(unique(x)) >= n)
  x
}

Then, we can incorporate this function into a test suite using n=3.

suite_3_unique <-
  consonance_suite() +
  consonance_assert("at least 3 distinct x", assert_n_unique, n=3, .var="x")

Next, we can apply this suite on the anscombe datasets.

datasets$A %>% validate(suite_3_unique)
datasets$D %>% validate(suite_3_unique)

Dataset A passes the assessment, but dataset D does not (refer to the dataset visualizations above). This is not a purely conceptual exercise: if a step in the analysis relies on smooth.spline to fit a model, the x-coordinates will need to have at least three unique positions for spline nodes. This procedure detects that smoothing splines will not be feasible for dataset D.

Another type of custom function is described in the separate vignette on consonance testing for models.

Terminology: assert, check, test, etc.

There are several related term that appear in the literature on unit testing and argument checking: assert, check, test, expect, validate, verify, insist, ad perhaps others. These terms are almost synonyms in everyday language, and in the context of code they are sometimes used inconsistently and at other times delineate precise behaviors.

The consonance package uses the word 'test' in a loose sense as well as in a precise sense. The loose sense is used in the 'consonance testing', in the label for a component in a suite object (e.g. suite$tests). The word is meant to convey applying criteria/rules on input data.

The consonance package also uses the verbs 'assert', 'check', and 'test' in a precise sense, following the convention described in package checkmate. Given a function f(x), the three actions process an object x and signal a positive (PASS) or a negative (FAIL) outcome.

| Function type | outcome: PASS | outcome: FAIL | | --------------| ---- | ---- | | assert | x or invisible(x) | stop() | | check | TRUE | "string with error message" | | test | TRUE | FALSE |

Given this terminology, function assert_n_unique is an assertion because it returns its input or stops execution. An analogous behavior might be written as a test.

test_n_unique <- function(x, n) {
  length(unique(x)) >= n
}

This alternative function type can also be used within a consonance suite. However, it should be added via a different constructor.

suite_3_unique_alt <-
  consonance_suite() +
  consonance_test("at least 3 distinct x", test_n_unique, n=3, .var="x")

Note that this definition involves consonance_test rather than consonance_assert. For a check function, the relevant constructor is consonance_check.

All three function types can be used together in a single consonance suite. However, some puzzling behaviors may arise if a function of one type is added with an inappropriate constructor. It is probably a good idea to pick one function style and stick with it.

Apart from assert, check, and test, natural language also has other verbs with similar meanings. The testthat package uses 'expect', the assertthat package uses 'assert that', and the assertr package also uses 'verify' and 'insist'. Unfortunately, some of those constructs serve different purposes than what is provided by the consonance package, so they do not have direct analogs. To mitigate potential confusion, this vignette and other docs try to avoid using those alternative keywords.

Importance levels

We have seen that test_consonanace can generate error messages and halt execution via R's stop. In some situations, it may be appropriate to signal a potential problem but to nonetheless continue execution. This can be achieved by setting importance levels using argument .level.

As a practical example, let's look at the distribution of x-coordinates in the anscombe dataset.

# from the scatter plots, x-coordinates are in range [4, 19]
anscombe.x <- unlist(lapply(datasets, function(d) { d$x } ))
anscombe.qs <- quantile(anscombe.x, p=c(0, 0.1, 0.9, 1))
anscombe.qs

anscombe.qs gives the minimum and maximum values for x, and the 10\% and 90\% quantiles as well. If we construct regression models (next section), it would for relevant to know if a new value for x lies in this range. Let's make a suite that uses the 10\% and 90\% quantiles to signal a warning, and the min/max values to escalate to an error.

suite_x <-
  consonance_assert("within range", assert_numeric,
                    lower=anscombe.qs[1], upper=anscombe.qs[4],
                    .var="x", .level="error") +
  consonance_assert("within inner range", assert_numeric,
                    lower=anscombe.qs[2], upper=anscombe.qs[3],
                    .var="x", .level="warning")

The first assertion uses .level="error", which is the default value and corresponds to the examples we have seen before. The second assertion uses .level="warning". Both act on a column .var="x" in a data frame. So let's create small data frames.

# a value of x outside the range should here raise an error
data.frame(x=min(anscombe.x)-1) %>% validate(suite_x)
# a value of x on the outskirts of the distribution should raise a warning
data.frame(x=mean(anscombe.qs[1:2])) %>% validate(suite_x)

Note that the last example generates only warnings and allows execution to continue. It is also possible to run validate in a very strict mode and stop execution even upon warnings. This can be achieved by setting level.

# a value of x on the outskirts of the distribution should raise a warning.
# by setting level="warning", the test suite will raise an error
data.frame(x=mean(anscombe.qs[1:2])) %>% validate(suite_x, level="warning")

Model consonance

The package enables attaching a consonance suite to any R object, for example to outputs from lm, glm, or other modeling frameworks. To demonstrate this, let's create a regression model for the second anscombe series.

B_lm <- lm(y~x, data=datasets$B)
B_lm

As the plot at the start of the section shows, a linear regression is a reasonable first approximation for the series, but the data is actually better modeled by a parabola. It would be a mistake to use the linear model outside the x-range used to fit the model. So let's create a suite of tests to capture this reasoning.

suite_lm <- consonance_suite() +
  consonance_assert("x range", assert_numeric, .var="x", .level="warning",
                    lower=min(datasets$B$x), upper=max(datasets$B$x))

We can now attach the suite to the model.

B_lm_2 <- attach_consonance(B_lm, suite_lm)

The suite can be previewed through the $consonance component of the object.

B_lm_2$consonance

We can now use the model to perform data consonance checks.

# prediction on x-coordinates within the model range
in_range <- data.frame(x=c(5, 10))
in_range %>% validate(B_lm_2) %>% predict(object=B_lm_2)
# prediction on x-coordinates outside of the model range
out_range <- data.frame(x=c(20, 30))
out_range %>% validate(B_lm_2) %>% predict(object=B_lm_2)

The second dataset generates warnings and produces y-coordinates that are quite far from what one might guess from a parabolic trend (refer to the figure).

Note that this example produces warning messages, but it nonetheless displays a prediction. This is because the assertion in suite_lm was defined with .level="warning". It would also be reasonable to set .level="error" (the default). validate would then halt execution before any prediction is made.

Logging

By default, validate does not generate output when evaluated on consonant data and outputs messages upon problems. The level of logging can be adjusted by setting logging.level to "INFO", "WARN", or "ERROR". There are several mechanisms to implement this adjustment.

Setting the logging level in the suite constructor affects all evaluations.

# create a suite that will generate a lot of log messages
suite_verbose <-
  consonance_suite(logging.level="INFO") +
  consonance_assert("character vector", assert_character)
# simple evaluations will generate log messages
validate(c("a", "b"), suite_verbose)

Another way to toggle the logging level is within validate.

# set up a canonical suite
suite_standard <-
  consonance_suite() +
  consonance_assert("character vector", assert_character)
# by default, successful runs will not generate messages
validate(c("a", "b"), suite_standard)
# but we can request more information
validate(c("a", "b"), suite_standard, logging.level="INFO")
# logging goes back to normal without the explicit argument
validate(c("a", "b"), suite_standard)

Instead of writing messages to the console, the logger can output into a file instead. This is achieved with argument log.file, either in the suite constructor or within validate. For brevity, an example of the latter is as follows.

# the verbose suite would normally display messages in the console
validate(c("a", "b"), suite_verbose, log.file="my-log.log")

These mechanisms above require setting the logging settings in the constructor, or each time during run-time. But in some cases it may be desirable to adjust the logging level for an existing suite so that a new settings becomes the new default behavior. This can be achieved with a new constructor and the composition operator.

# create a new empty suite with desired logging settings,
# then add consonance test from an old suite into the new one
suite_normal <- consonance_suite() + suite_verbose
# simple evaluations will no longer generate log messages
validate(c("a", "b"), suite_normal)
# simple evaluations with the versbose suite still generate log messages
validate(c("a", "b"), suite_verbose)

Note: to change logging settings for a suite that is attached to a model, create a new suite and attach it to the same model.



tkonopka/consonance documentation built on Oct. 9, 2020, 2:09 p.m.