This section introduces more features of the consonance
package.
As in the overview sections, code examples require the following packages.
library(consonance) library(checkmate) library(magrittr)
For concreteness, we will use the anscombe
dataset available with base R.
head(anscombe, 2)
The dataset holds four pairs of (x, y) coordinates in arbitrary units. It will be convenient to arrange the data into four separate data frames.
datasets <- list(A = data.frame(id = "A", x = anscombe$x1, y = anscombe$y1), B = data.frame(id = "B", x = anscombe$x2, y = anscombe$y2), C = data.frame(id = "C", x = anscombe$x3, y = anscombe$y3), D = data.frame(id = "D", x = anscombe$x4, y = anscombe$y4))
What makes these datasets interesting is that they have very similar (Pearson) correlations, but quite distinct patterns.
oldpar <- par() par(mfrow=c(1,4)) plot_anscombe(datasets$A) plot_anscombe(datasets$B) plot_anscombe(datasets$C) plot_anscombe(datasets$D) newpar <- par(oldpar)
In this section, we will set up consonance suites for this data and for regression models.
We saw in the overview how to check that data vectors are numeric.
suite_numeric_vec <- consonance_suite() + consonance_assert("numeric vector", assert_numeric, any.missing=FALSE) # perform the assessment on dataset A datasets$A$x %>% validate(suite_numeric_vec) datasets$A$y %>% validate(suite_numeric_vec)
When data consists of a data frame, data table, or list, test can be carried
out on individual columns / components. This behavior can be triggered via
argument .var
.
suite_numeric_df <- consonance_assert("numeric x", assert_numeric, any.missing=FALSE, .var="x") + consonance_assert("numeric y", assert_numeric, any.missing=FALSE, .var="y")
The argument any.missing
in each assertion is understood to be passed to
the assertion function, assert_numeric
. Argument .var
begins with
a .
in order to avoid name-clashing with arguments that may be relevant to the
assertion function. It instructs package consonance
to carry out the assertion
on the designated component.
Applying this suite on one of the anscombe
datasets should execute quietly.
datasets$A %>% validate(suite_numeric_df)
To observe a failed test, we can apply the suite on a corrupted data frame.
temp_df <- datasets$A temp_df$x[2] <- NA temp_df %>% validate(suite_numeric_df)
The error messages indicate the corrupt data was detected, as intended.
The approach described above to test columns in data frames also applies to components in arbitrary objects or lists.
# a list object with miscellaneous components, including $x and $y temp_list_A <- list(x=c(1, 2, 3), y=c(1, 2), comment="x and y numeric") # testing temp_list_A should execute quietly temp_list_A %>% validate(suite_numeric_df) # another list object with a y component that is not numeric temp_list_B <- list(x=c(1, 2), y=factor(c(1, 2)), comment="y is not numeric") # testing temp_list_B should generate messages temp_list_B %>% validate(suite_numeric_df)
Although temp_list_A
and temp_list_B
are not data frames, the suite
of tests suite_numeric_df
applies the criteria to components x
and y
just as before. But it is important to disclose
that the .var
argument in the test constructors only work one-level
deep. It is not possible to specify deep lookups via the constructor, i.e.
it is not possible to request that a test extract component x from
an object, then extract a sub-component x2, and check properties of the
sub-component. To implement such logic, it is necessary to define a
custom assertion.
We have seen that consonance assertions can be defined using assert_numeric
and assert_named
from the checkmate
package. Indeed, checkmate
provides
dozens of functions that cover many often-used criteria, and all of them
can be incorporated into a consonance suite. Nonetheless, there may
arise situations where a ready-made assertion function is not available.
In those scenarios it is possible to define a custom function.
As an example, let's suppose we want to require that a vector has at least three distinct values.
assert_n_unique <- function(x, n) { stopifnot(length(unique(x)) >= n) x }
Then, we can incorporate this function into a test suite using n=3
.
suite_3_unique <- consonance_suite() + consonance_assert("at least 3 distinct x", assert_n_unique, n=3, .var="x")
Next, we can apply this suite on the anscombe
datasets.
datasets$A %>% validate(suite_3_unique) datasets$D %>% validate(suite_3_unique)
Dataset A passes the assessment, but dataset D does not (refer to the
dataset visualizations above). This is not a purely conceptual exercise: if a
step in the analysis relies on smooth.spline
to fit a model, the
x-coordinates will need to have at least three unique positions for spline
nodes. This procedure detects that smoothing splines will not be feasible for
dataset D.
Another type of custom function is described in the separate vignette on consonance testing for models.
There are several related term that appear in the literature on unit testing and argument checking: assert, check, test, expect, validate, verify, insist, ad perhaps others. These terms are almost synonyms in everyday language, and in the context of code they are sometimes used inconsistently and at other times delineate precise behaviors.
The consonance
package uses the word 'test' in a loose sense as well as in
a precise sense. The loose sense is used in the 'consonance testing', in the
label for a component in a suite object (e.g. suite$tests
). The word is meant to convey applying criteria/rules on input data.
The consonance
package also uses the verbs 'assert', 'check', and 'test' in
a precise sense, following the convention described in package checkmate
.
Given a function f(x)
, the three actions process an object x
and signal
a positive (PASS) or a negative (FAIL) outcome.
| Function type | outcome: PASS | outcome: FAIL |
| --------------| ---- | ---- |
| assert | x
or invisible(x)
| stop()
|
| check | TRUE
| "string with error message"
|
| test | TRUE
| FALSE
|
Given this terminology, function assert_n_unique
is an assertion because it
returns its input or stops execution. An analogous behavior might be written
as a test.
test_n_unique <- function(x, n) { length(unique(x)) >= n }
This alternative function type can also be used within a consonance suite. However, it should be added via a different constructor.
suite_3_unique_alt <- consonance_suite() + consonance_test("at least 3 distinct x", test_n_unique, n=3, .var="x")
Note that this definition involves consonance_test
rather than
consonance_assert
. For a check function, the relevant constructor is
consonance_check
.
All three function types can be used together in a single consonance suite. However, some puzzling behaviors may arise if a function of one type is added with an inappropriate constructor. It is probably a good idea to pick one function style and stick with it.
Apart from assert, check, and test, natural language also has other verbs
with similar meanings. The testthat
package uses 'expect', the assertthat
package uses 'assert that', and the assertr
package also uses 'verify' and
'insist'. Unfortunately, some of those constructs serve different purposes
than what is provided by the consonance
package, so they do not have direct
analogs. To mitigate potential confusion, this vignette and other docs try to
avoid using those alternative keywords.
We have seen that test_consonanace
can generate error messages and halt
execution via R's stop
. In some situations, it may be appropriate to
signal a potential problem but to nonetheless continue execution.
This can be achieved by setting importance levels using argument .level
.
As a practical example, let's look at the distribution of x-coordinates
in the anscombe
dataset.
# from the scatter plots, x-coordinates are in range [4, 19] anscombe.x <- unlist(lapply(datasets, function(d) { d$x } )) anscombe.qs <- quantile(anscombe.x, p=c(0, 0.1, 0.9, 1)) anscombe.qs
anscombe.qs
gives the minimum and maximum values for x, and the
10\% and 90\% quantiles as well. If we construct regression models (next
section), it would for relevant to know if a new value for x lies in this
range. Let's make a suite that uses the 10\% and 90\%
quantiles to signal a warning, and the min/max values to escalate to an error.
suite_x <- consonance_assert("within range", assert_numeric, lower=anscombe.qs[1], upper=anscombe.qs[4], .var="x", .level="error") + consonance_assert("within inner range", assert_numeric, lower=anscombe.qs[2], upper=anscombe.qs[3], .var="x", .level="warning")
The first assertion uses .level="error"
, which is the default value and
corresponds to the examples we have seen before. The second assertion uses
.level="warning"
. Both act on a column .var="x"
in a data frame. So
let's create small data frames.
# a value of x outside the range should here raise an error data.frame(x=min(anscombe.x)-1) %>% validate(suite_x) # a value of x on the outskirts of the distribution should raise a warning data.frame(x=mean(anscombe.qs[1:2])) %>% validate(suite_x)
Note that the last example generates only warnings and allows execution
to continue. It is also possible to run validate
in a very strict
mode and stop execution even upon warnings. This can be achieved by setting
level
.
# a value of x on the outskirts of the distribution should raise a warning. # by setting level="warning", the test suite will raise an error data.frame(x=mean(anscombe.qs[1:2])) %>% validate(suite_x, level="warning")
The package enables attaching a consonance suite to any R object, for example
to outputs from lm
, glm
, or other modeling frameworks. To demonstrate
this, let's create a regression model for the second anscombe
series.
B_lm <- lm(y~x, data=datasets$B) B_lm
As the plot at the start of the section shows, a linear regression is a reasonable first approximation for the series, but the data is actually better modeled by a parabola. It would be a mistake to use the linear model outside the x-range used to fit the model. So let's create a suite of tests to capture this reasoning.
suite_lm <- consonance_suite() + consonance_assert("x range", assert_numeric, .var="x", .level="warning", lower=min(datasets$B$x), upper=max(datasets$B$x))
We can now attach the suite to the model.
B_lm_2 <- attach_consonance(B_lm, suite_lm)
The suite can be previewed through the $consonance
component of the object.
B_lm_2$consonance
We can now use the model to perform data consonance checks.
# prediction on x-coordinates within the model range in_range <- data.frame(x=c(5, 10)) in_range %>% validate(B_lm_2) %>% predict(object=B_lm_2) # prediction on x-coordinates outside of the model range out_range <- data.frame(x=c(20, 30)) out_range %>% validate(B_lm_2) %>% predict(object=B_lm_2)
The second dataset generates warnings and produces y-coordinates that are quite far from what one might guess from a parabolic trend (refer to the figure).
Note that this example produces warning messages, but it nonetheless displays
a prediction. This is because the assertion in suite_lm
was defined with
.level="warning"
. It would also be reasonable to set .level="error"
(the default). validate
would then halt execution before any
prediction is made.
By default, validate
does not generate output when
evaluated on consonant data and outputs messages upon problems. The
level of logging can be adjusted by setting logging.level
to "INFO"
,
"WARN"
, or "ERROR"
. There are several mechanisms to implement
this adjustment.
Setting the logging level in the suite constructor affects all evaluations.
# create a suite that will generate a lot of log messages suite_verbose <- consonance_suite(logging.level="INFO") + consonance_assert("character vector", assert_character) # simple evaluations will generate log messages validate(c("a", "b"), suite_verbose)
Another way to toggle the logging level is within validate
.
# set up a canonical suite suite_standard <- consonance_suite() + consonance_assert("character vector", assert_character) # by default, successful runs will not generate messages validate(c("a", "b"), suite_standard) # but we can request more information validate(c("a", "b"), suite_standard, logging.level="INFO") # logging goes back to normal without the explicit argument validate(c("a", "b"), suite_standard)
Instead of writing messages to the console, the logger can output into a file
instead. This is achieved with argument log.file
, either in the suite
constructor or within validate
. For brevity, an example of the
latter is as follows.
# the verbose suite would normally display messages in the console validate(c("a", "b"), suite_verbose, log.file="my-log.log")
These mechanisms above require setting the logging settings in the constructor, or each time during run-time. But in some cases it may be desirable to adjust the logging level for an existing suite so that a new settings becomes the new default behavior. This can be achieved with a new constructor and the composition operator.
# create a new empty suite with desired logging settings, # then add consonance test from an old suite into the new one suite_normal <- consonance_suite() + suite_verbose # simple evaluations will no longer generate log messages validate(c("a", "b"), suite_normal) # simple evaluations with the versbose suite still generate log messages validate(c("a", "b"), suite_verbose)
Note: to change logging settings for a suite that is attached to a model, create a new suite and attach it to the same model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.