Make-like build management, reimagined for R.
See below for installation instructions.
"make", when it works, is wonderful. Being able to change part of a complicated system and the re-make, updating only the parts of the system that have changed is great. While it gets some use It's very heavily tailored towards building software though. While make can be used to create reproducible research workflows (e.g. here and here), it is a challenge.
The idea here is to re-imagine a set of ideas from make but built for R. Rather than having a series of calls to different instances of R (as happens if you run make on R scripts), the idea is to define pieces of a pipeline within an R session. Rather than being language agnostic (like make must be),
remake is unapologetically R focussed.
Note: This package is under heavy development (as of May 2015), so things may change under you if you start using this now. However, the core format seems to be working on some nontrivial cases that we are using in our own work. At the same time, if you're willing to have things change around a bit feel free to start using this and post issues with problems/friction/ideas etc and the package will reflect your workflow more.
Note: Between versions
0.2.0 the database format has changed. This will require rebuilding your project. This corresponds to adding the dependency on
storr. Everything else should remain unchanged though.
You describe the beginning, intermediate and end points of your analysis, and how they flow together.
There might be very few steps or very many, but
remake will take care of stepping through the analysis in a correct order (there can be more than one correct order!).
Here's a very simple analysis pipeline that illustrates the basic idea:
The remakefile that describes this pipline might look like this:
sources: - code.R targets: all: depends: plot.pdf data.csv: command: download_data(target_name) processed: command: process_data("data.csv") plot.pdf: command: myplot(processed) plot: true report.md: depends: processed knitr: true
You still need to write functions that carry out each step; that might look something like this, but it would define the functions
myplot. Remake can then be run from within R:
remake::make() # [ BUILD ] data.csv | download_data("data.csv") # [ BUILD ] processed | processed <- process_data("data.csv") # [ BUILD ] plot.pdf | myplot(processed) # ==> plot.pdf # [ ] all
BUILD": next to each target indicates that it is being run (which may take some time for a complicated step) and after the pipe a call is printed that indicates what is happening (this is a small white lie).
remake::make() # [ OK ] data.csv # [ OK ] processed # [ OK ] plot.pdf # [ ] all
Everything is up to date, so remake just skips over things.
There are also special
remake::make("report.md") # [ OK ] data.csv # [ OK ] processed # [ ] report.Rmd # [ KNIT ] report.md | knitr::knit("report.Rmd", "report.md")
This arranges for the target
processed, on which this depends (see the remakefile) to be passed through to
knitr, along with all the functions defined in
code.R, and builds the report
report.md from the knitr source
report.Rmd (the source is here). Note that because
processed was already up to date,
remake skips rebuilding it.
remake can also be run from the command line (outside of R), to make it easy to include as part of a bigger pipeline, perhaps using make! (I do this in my own use of remake).
Rather than require that you buy in to some all-singing, all-dancing workflow tool,
remake tries to be agnostic about how you work: there are no special functions within your code that you need to use. You can also create a linear version of your analysis at any time:
remake::make_script() # source("code.R") # download_data("data.csv") # processed <- process_data("data.csv") # pdf("plot.pdf") # myplot(processed) # dev.off()
remakedetermines if any dependencies have changed when need running your analysis. So if a downloaded data file changes, everything that depends on it will be rebuilt when needed.
remakeuses a hash (like a digital fingerprint) of the file or object to determine if the contents have really changed. So inconsequential changes will be ignored.
remakealso checks if the functions used as rules (or called from those functions) have changed and will rebuild if these have changed (for the rationale here, see here).
remakekeeps track of which files and objects it created, it can automatically clean up after itself. This makes it easy to rerun the entire analysis beginning-to-end.
knitras special targets.
.gitignorefiles to prevent accidentally committing large output to your repository.
Some tutorials on using remake with different datasets.
Install using devtools:
If you don't have devtools installed you will see an error "there is no package called 'devtools'"; if that happens install devtools with
remake depends on several R packages, all of which can be installed from CRAN. The required packages are:
R6for holding things together
yamlfor reading the configuration
digestfor efficiently hashing objects
crayonfor coloured output on the terminal (not in Rstudio or Rgui)
optparsefor a command line version (run from outside of an R session)
install.packages(c("R6", "yaml", "digest", "crayon", "optparse"))
We also depend on
storr for object storage:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.