extras/tests.md

I am collecting here some notes on testing in R.

There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is in fact not true, as R itself has a capable testing facility in "R CMD check" (a command triggering R checks from outside of any given integrated development environment).

By a combination of skimming the R-manuals ( https://cran.r-project.org/manuals.html ) and running a few experiments I came up with a description of how R-testing actually works. And I have adapted the available tools to fit my current preferred workflow. This may not be your preferred workflow, but I have and give my reasons below.

A glimpse of the R test ecosystem

1) During "R CMD check", R runs all .R files (and .r files) in the tests directory. It counts tests as failures if the test raises an exception (for example calls stop()), or if the text output does not match what is already in a .Rout.save file in the same directory. 2) The contents of the tests directory are written into source-distribution packages, but not written into binary-distribution packages. 3) The contents of the inst directory are copied into the root-level of package distributions. 4) RUnit (released June 2004) itself collects test suites from directories and then runs them, recording user assertions in a JUnit-inspired report. The idea is that once you have a bunch of tests you really want to track them some way. 5) testthat (released November 2009) self-describes as integrating into a workflow. It runs tests found in the tests/testthat sub-directory (directory found relative to the package source, not relative to an installed package) and tracks user assertions. The related devtools/usethis package both writes a canonical test controlling file into the tests directory (allowing testthat to be triggered by "R CMD check"), and can also directly run tests. 6) unitizer (released April 2017) bases its tests on comparisons of objects, rather than comparing text or requiring user assertions. It also aids in producing and updating reference objects. 7) tinytest (pre-release) decouples the ideas of test failures from exceptions.

The different types of tests

There are many reasons for testing, and different ways that tests are used. Much confusion stems from a failure to separate the different motivations behind testing. Some categories of tests include:

1) Acceptance unit tests. These are tests that must succeed for a package to be considered usable. Failing these tests can cause the package to be rejected by CRAN or by end-users. 2) Weak integration tests. Integration tests are different than unit tests, but there is some overlap. For packages that are tightly coupled a wrong version of one package can cause a related package to fail. In this case it make a lot of sense to expose the tests to the users, so they can check the compatibility of their installed package suites. 3) Tests that represent development goals. These tests can be from test-driven development, or reproducible errors incorporated from submitted issues. These tests may be in a failing state for some time. They are more private to the package developer and should not be distributed to CRAN or to the end users.

Confusion between these (and additional) categories of use, or assuming there is only one use of tests, are the sources of many arguments over proper testing procedures and/or appropriate test systems. In fact, it is useful to discuss the currently available test systems in R in light of (at least) the testing scenarios we've just described:

The "R CMD check" mechanism seems optimized to support case 1. RUnit directly supports case 3; the wrapr adapter for RUnit (which we will discuss below) is designed to support cases 1 and 2. testthat::test_check() supports case 1, and testthat::test_dir() supports case 3. unitizer and tinytest seem to emphasize cases 3 and 1.

Observations (based on above)

1) R package developers do not need to use a test system such as RUnit or testthat to run tests. The data.table package is a great example of this: a core package running thousands of tests, without needing an external testing package. 2) If you wish to allow end-users to run tests for binary distributed packages, the package developer must place them somewhere other than in tests. My suggestion is put them in inst/unit_tests, which will get installed at the top-level of packages and is findable with the system.file() command. 3) Package developers need the ability to run tests from both their sources (which RUnit and testthat both supply) and also from installed copies of their package (which RUnit supplies, as RUnit is path oriented rather than package oriented, and testthat supplies through the test_dir() command). 4) The same package may be distributed to users either in binary or source fashion. A user may receive a binary package from CRAN if they are a non-Unix using a current (or near-current) version of R. They will receive a source version if they are running an obsolete version of R, or if the package has not yet been built by CRAN for their version of R. Because tests in the tests directory are present in source versions of packages and not in binary versions of packages this means the user may or may not get tests. This seems like needless variation. Any needless variation is a possible source of confusion and errors. In my opinion the user should never get tests, or always get tests. Since there is no way to strip CRAN acceptance tests out after CRAN submission, I suggest the user always get tests. 5) Tests need to be in a canonical place that can be found both by the user and by the test runner. Relative paths can cause problems, so to reliably run tests we need an anchor point other that the R's current working directory. One such anchor point is the installed package directory structure, which can be searched with system.file(). This argues for tests being part of a the package distribution. (Another way is to find paths relative to test runner source code, though the solutions can be problematic, possibly the "getSrcDirectory(function(){})" solution is a good fit.)

Critique

My advice

For your package think a bit on what you want from testing, instead of uncritically following popular procedures. As in all cases: how R actually works is described in the manuals (https://cran.r-project.org/manuals.html), and may not be what you heard on the street.

My current R test setup

To conveniently provide test interfaces both to R CMD check and to end-users simultaneously, I now do the following.



WinVector/wrapr documentation built on Aug. 29, 2023, 4:51 a.m.