knitr::opts_chunk$set(
  collapse = TRUE,
  fig.path = "man/figures/vignette-",
  comment = "#>",
  dpi = 120
  )
  options(drake_make_menu = FALSE,
  drake_clean_menu = FALSE)

The goal of drakepkg is to demonstrate how a drake workflow can be organized as an R package. Users who are not yet familiar with drake should review the package's User Manual before continuing with this vignette.

The following examples illustrate the way that drake workflow's can be reproduced when they're included in an R package.

Simple Plan

This example borrows the main example from the drake package documentation and recreates it within an R package.

The plan is included in drakepkg as a function:

library(drakepkg) # devtools::install_packages("tiernanmartin/drakepkg")

get_example_plan_simple()

Here are the steps needed to reproduce this plan:

  1. Create a new RStudio Project or navigate to an empty working directory (not required but strongly recommended)
  2. Copy the package's directories and source code files into your working directory: copy_drakepkg_files()
  3. Make the plan: make(get_example_plan_simple())
  4. Access the plan's targets using drake functions like readd() or loadd()
  5. View the html documents produced by the workflow (see the documents/ directory)

The first step is optional but strongly recommended; it is generally accepted as a best practice that data analysis projects should be self-contained.

The second step is an important one. Most drake plans interact with the user's file system at some point, typically to read inputs or write outputs. drakepkg's inst/ directory contains the files and directories that are needed to successfully make get_example_plan_simple(). The copy_drakepkg_files() function copies the following directories from drakepkg into the user's working directory:

copy_drakepkg_files()
.
├── documents
├── extdata 
└── intdata
    ├── R
    │   └── make-iris-internal.R
    └── iris-internal.xlsx 

The third step is to make the plan:

clean(destroy = TRUE) 
make(get_example_plan_simple())

The worflow's dependency graph can be displayed using drake::vis_drake_graph():

get_example_plan_simple() %>% 
  drake_config() %>% 
  vis_drake_graph()

The final output of the plan above is the report target but any of the targets can be accessed using drake functions like loadd() or readd().

# retrieve a target from the drake cache and inspect it
loadd(fit)
summary(fit) 

# inspect a target without storing it in the local environment
readd(hist) 

Plan With External Data

The second example builds on the first by introducing external data. The drake cache automatically stores a copy of each target in a plan, but when the plan accessess data from an external source it's a good idea to store a local copy of that data in addition to the cached copy.

The following plan downloads the iris dataset from a github repository and stores it in the extdata directory in the user's working directory, like so:

.
├── documents
├── extdata
|   └── iris-external.xlsx <-- file downloaded in the plan is stored here
└── intdata
    ├── R
    │   └── make-iris-internal.R
    └── iris-internal.xlsx 

Here is the plan:

clean(destroy = TRUE) 
make(get_example_plan_external())
get_example_plan_external() %>% 
  drake_config() %>% 
  vis_drake_graph()

The visualization below shows that the new "iris" data is actually just random numbers:

readd(hist)

Plan With Open Science Framework Compendium

(work in progress)



tiernanmartin/drakepkg documentation built on March 11, 2020, 3:11 a.m.