Cascadia R Conference 2019 Update: the slides from Tiernan Martin’s talk can be downloaded here: drakepkg-slides-cascadiarconf2019.pdf
The goal of drakepkg
is
to demonstrate how a drake
workflow can be organized as an R package.
Why do this? Because the package system in R provides a widely-adopted
method of structuring, documenting, testing, and sharing R code. While
most R packages are general purpose, this approach applies the same
framework to a specific workflow (or set of workflows). It increases the
reproducibility of a complex workflow without requiring users to
recreate the workflow’s environment with a container image (although
that approach is compatible with
drakepkg
- see
januz/drakepkg).
The drakepkg
package is
experimental in nature and currently requires some inconvenient steps
(see the drake manual - 7.4 Workflows as R
packages);
please use caution when applying this approach to your own work.
You can install the released version of
drakepkg
from its Github
repository with:
devtools::install_github("tiernanmartin/drakepkg")
The following table shows how each feature of a
drake
workflow is made accessible
within an R
package:
| drake
| R Package |
| :------------------------ | :------------------------------------------------------------------------------------------------------------ |
| plans, commands | functions (R/*.R
) |
| targets | stored in the cache (.drake/
) |
| input files, output files | internal data (inst/intdata/*
), external data (inst/extdata/*
), images and documents (inst/documents/*
) |
The package comes with two example
drake
plans, both of which are
loosely based on the main
example included in the
drake
package:
drakepkg::get_example_plan_simple()
drakepkg::get_example_plan_external()
The first plan looks like this:
library(drake)
get_example_plan_simple()
#> # A tibble: 5 x 2
#> target command
#> <chr> <expr>
#> 1 raw_data readxl::read_excel(file_in("intdata/iris-internal.xlsx")) ~
#> 2 ready_data dplyr::mutate(raw_data, Species = forcats::fct_inorder(Specie~
#> 3 hist create_plot(ready_data) ~
#> 4 fit lm(Sepal.Width ~ Petal.Width + Species, ready_data) ~
#> 5 report write_html_report(hist, fit, knitr_in("documents/report-simpl~
Several commands used in the plan (e.g,create_plot()
,
write_report_simple()
) are included as part of the
drakepkg
R package and so
is the plan itself; the documentation for each of these functions can be
accessed using R’s help()
function (for example,
help(get_example_plan_simple)
).
Once you have installed and loaded
drakepkg
, you can
reproduce the introductory plan’s workflow by performing the following
steps:
copy_drakepkg_files()
functionget_example_plan_simple()
) and then make it
(make(get_example_plan_simple())
)drake
functions like readd()
or
loadd()
documents/
directory# Step 1: copy the source code files into the working directory
copy_drakepkg_files()
# Step 2A: view the example plan
get_example_plan_simple()
#> # A tibble: 5 x 2
#> target command
#> <chr> <expr>
#> 1 raw_data readxl::read_excel(file_in("intdata/iris-internal.xlsx")) ~
#> 2 ready_data dplyr::mutate(raw_data, Species = forcats::fct_inorder(Specie~
#> 3 hist create_plot(ready_data) ~
#> 4 fit lm(Sepal.Width ~ Petal.Width + Species, ready_data) ~
#> 5 report write_html_report(hist, fit, knitr_in("documents/report-simpl~
# Step 2B: make the example plan
make(get_example_plan_simple())
#> All targets are already up to date.
# Step 3: examine the plan's targets
readd(fit)
#>
#> Call:
#> lm(formula = Sepal.Width ~ Petal.Width + Species, data = ready_data)
#>
#> Coefficients:
#> (Intercept) Petal.Width Speciesversicolor
#> 3.236 0.781 -1.501
#> Speciesvirginica
#> -1.844
readd(hist)
This example and others are available in the package vignette
(vignette('drakepkg')
).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.