knitr::opts_chunk$set(echo = FALSE, message = FALSE, cache = TRUE)
library(practice) library(plyr) library(dplyr) library(tidyr) library(ggplot2)
There is no single list of "best practices" for writing high quality software, but there are some general traits that such software possesses. McConnell points to many of them in Code Complete \citet{codecomplete} and divides them into "external" traits, that face the user, and "internal" traits, that face the developer.
From the perspective of the user, software should be accurate, fast-running and easy to use. From the perspective of the developer, the internal code should be maintainable, portable and lend itself to being tested. We can point to specific conventions or expectations that are built on these traits.
We would also add one final trait not covered by McConnell, and perhaps specific to the R community's focus on a community of developers rather than any individual developer, and that is the ability of people to transition from being external users to being internal developers. In other words, whether the software is designed and built in such a way as to create a very low barrier to upstream bug reports and patches from individual software users.
Now that we have identified these "best practices" and "traits", how do we test for them in R packages?
For unit testing, we can take the content of a package, and its "metadata" (the DESCRIPTION) file and use our knowledge of how the different frameworks (testthat and RUnit) make themselves known. In the case of testthat, a "tests" directory is created in the package source code, with a "testthat" folder underneath it. In the case of RUnit, tests can appear in a dedicated directory, or scattered throughout the package code, but must ultimately include either an explicit call to load the RUnit package, or an implicit call by using :: to refer to exported objects from RUnit's namespace. In both cases, the packages may be mentioned in the DESCRIPTION file, but this is not certain.
In the case of tests that do not use a package framework, there is no concrete way of automatically identifying if tests exist, but a general convention is to create a "tests" directory. Accordingly we adopted the following heuristics to identify the presence of tests, and what framework (if any) they followed:
David, I'm going to let you take this section because you understand knitr-versus-sweave and all of that malarkey much better than muggins here.
The predictability of a package is, as said, not the easiest thing to evaluate, but we can tease out some information by looking at several characteristics. One obvious heuristic for user-facing predictability is to look for the presence of semantic versioning, a versioning system that distinguishes backwards-compatible bugfixes, backwards-compatible new features, and "breaking changes" that create an incompatability between versions. The presence of this versioning system, or something analogous, allows the user to trivially identify, on an update, whether modifications between versions necessitate actions or changes on their end, and whether what the package will do has changed.
Using the above heuristics, we retrieved (distinctly) the metadata and source code of each package on CRAN as of 03:00:01 on 27 April 2015. This came to 6,551 packages in total. Metadata was retrieved using the meta-cran service, and the source code by downloading each package's source from CRAN. Each package was then checked using each of the heuristics described in the section above (Testing for Best Practices): the resulting dataset can be found in the R package that accompanies this paper.
Some figures and analyses, like Figure \ref{fig:CRAN_vignettes}.
use_vignettes <- sum(CRANpractices$vignette_format != "None") use_vignettes_percent <- 100 * use_vignettes / nrow(CRANpractices)
\begin{figure}
vignette_count <- CRANpractices %>% count(vignette_format, vignette_builder) %>% ungroup() %>% gather(metric, choice, -n) %>% mutate(metric = revalue(metric, c(vignette_format = "Vignette Format", vignette_builder = "Vignette Builder"))) %>% filter(choice != "None") %>% mutate(choice = reorder(choice, n, function(x) -mean(x))) ggplot(vignette_count, aes(choice, n)) + geom_bar(stat = "identity") + facet_wrap(~ metric, scale = "free", ncol = 2) + xlab("Choice") + ylab("Number of packages") + theme_bw(base_size = 10) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
\caption{Distribution of the choice of vignette builder and format, among the r round(use_vignettes_percent, 1)
\% of CRAN packages that use vignettes. \label{fig:CRAN_vignettes}}
\end{figure}
To cover the possibility that these heuristics can fail (non-framework based unit tests can be fairly idiosyncratic) we also hand-coded 50 packages identified as having no tests. Of those 50, 3 had non-framework based unit tests, all using different approaches - one stored unit tests outside the package, for example, while another required the package to be rebuilt with custom variable flags for the tests to come into effect.
One fallback as an alternative to unit tests is the examples found within R documentation, which, as well as providing useful documentation, also provide early detection of bugs for CRAN and the developers of the package: when a package is rebuilt and checked, the examples are run, and errors are thrown if they do not complete. We noticed several packages without examples, and several more that had examples marked with dontrun
tags. We hypothesise, from the developers' comments and from our own experiences, that this is to comply with the CRAN policy requiring that the examples and tests take less than a specific time span to run: for complex code or computations, this prohibits examples that are run - by extension, this also prohibits a fallback for unit tests.
\bibliography{RJreferences}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.