Create research compendia
In rang: Reconstructing Reproducible R Computational Environments

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

There has been several implementations of research compendium in the R ecosystem already: rrtools, rcompendium, template, manuscriptPackage, and ProjectTemplate. The idea of use_rang() is not to create a new format. Instead, use_rang() can be used to enhance your current research compendium.

If you would like to create a new research compendium, you can either use any of the aforementioned formats; or to use create_turing() to create a structure suggested by The Turing Way. However, the idea is that use_rang(), a usethis-style function, is a general function that can work well with any structure. Just like rang in general, use_rang() is designed with interoperability in mind.

Case 1: Create a Turing-style research compendium

create_turing() can be used to create a general research compendium structure. The function generates an example structure like this:

.
├── bibliography.bib
├── CITATION
├── code
│   ├── 00_preprocess.R
│   └── 01_visualization.R
├── data_clean
├── data_raw
│   └── penguins_raw.csv
├── figures
├── .here
├── inst
│   └── rang
│       └── update.R
├── Makefile
└── paper.Rmd

More information about this can be found in the Turing Way. But in general:

Raw data should be in the directory data_raw
Scripts should be in the directory code, preferably named in the execution order.
The scripts should generate intermediate data files in data_intermediate, figures in figures.
After the code execution, a file for the manuscript is written with literate programming techniques. In this case, Paper.Rmd.

The special part is inst/rang/update.R. Running this script does the following things:

It scans the current directory for all R packages used.
It creates the infrastructure for building the Docker image to run the code.
It caches all R packages.

As written in the file, you should edit this script to cater for your own needs. You might also need to run this multiple time during the project lifecycle. You can also use the Makefile included to pull off some of the tasks. For example, you can run make update to run inst/rang/update.R. We highly recommend using GNU Make.

The first step is to run inst/rang/update.R. You can either run it by Rscript inst/rang/update.R or make update. It will determine the snapshot date, scan the current directory for R dependencies, determine the dependency graph, generate Dockerfile, and cache R packages.

After running it, you should have Dockerfile at the root level. In inst/rang, you should have rang.R and cache. Now, you can build the Docker image. We recommend using GNU Make and type make build (or docker build -t yourprojectimg .). And launch the Docker container (make launch or docker run --rm --name "yourprojectcontainer" -ti yourprojectimg). Another idea is to launch a Bash shell (make bash or docker run --rm --name "yourprojectcontainer" --entrypoint bash -ti yourprojectimg). Let's assume you take this approach.

Inside the container, you will get all your files. And that container should have all the dependencies installed and you can run all the scripts right away. Let's say

Rscript code/00_preprocess.R 
Rscript code/01_visualization.R 
Rscript -e "rmarkdown::render('paper.Rmd')"

You can copy any artefact generated inside the container from another shell instance.

docker cp yourprojectcontainer:/paper.pdf ./

Other ideas

Add a readme: usethis::use_readme()
Add a license: usethis::use_mit_license()
Add the steps to run the code in the Makefile, but you'll need to rerun make build.
Export the Docker image: make export and restore it make restore

Case 2: Enhance an existing research compendium

Oser et al. shared their data as a zip file on OSF. You can obtain a copy using osfr.

Rscript -e "osfr::osf_download(osfr::osf_retrieve_file('https://osf.io/y7cg5'))"
unzip meta-analysis\ replication\ files.zip
cd meta-analysis

Suppose you want to use Apptainer to reproduce this research. At the root level of this compendium, run:

Rscript -e "rang::use_rang(apptainer = TRUE)"

This compendium is slightly more tricky because we know that there is one undeclared GitHub package. You need to edit inst/rang/update.R yourself. In this case, you also want to fix the snapshot_date. Also, you know that "texlive" is not needed.

pkgs <- as_pkgrefs(here::here())
pkgs[pkgs == "cran::dmetar"] <- "MathiasHarrer/dmetar"

rang <- resolve(pkgs,
                snapshot_date = "2021-08-11",
                verbose = TRUE)

apptainerize(rang, output_dir = here::here(), verbose = TRUE, cache = TRUE,
             post_installation_steps = c(recipes[["make"]], recipes[["clean"]]),
             insert_readme = FALSE,
             copy_all = TRUE,
             cran_mirror = cran_mirror)

You can also edit Makefile to give the project a handle. Maybe "oser" is a good handle.

handle=oser
.PHONY: update build launch bash daemon stop export

Similar to above, we first run make build to build the Apptainer image. As the handle is "oser", it generates an Apptainer image called "oserimg.sif".

Similar to above, you can now launch a bash shell and render the RMarkdown file.

make bash
Rscript -e "rmarkdown::render('README.Rmd', output_file = 'output.html')"
exit

Upon you exit, you have "output.html" in your host machine. You don't need to transfer the file from the container. Please note that this feature is handy but can also have a negative impact to reproducibility.

What to share?

It is important to know that there are at least two levels of reproducibility: 1) Whether your computational environment can be reproducibly reconstructed, and 2) Whether your analysis is reproducibility. The discussion of reproducibility usually conflates the two. We want to focus on the 2nd goal.

If your goal is to ensure other researchers can have a compatible computational environment that can (re)run your code, The Turing Way recommends that one should share the research compendium and the container images, not just the recipes e.g. Dockerfile or container.def. There are many moving parts during the reconstruction, e.g. whether the source Docker image is available and usable. As long as Docker or Apptainer support the same image format (or allow upgrade the current format), sharing the images is the most future proof method.