stability-experimental Travis build status Test coverage

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(drake.hasty)

Hasty mode for the drake R package

Hasty mode is accelerated execution with all of drake's storage and reproducibility guarantees stripped away. For experimentation only. Use at your own risk.

Drawbacks

  1. DRAKE NO LONGER PROVIDES EVIDENCE THAT YOUR WORKFLOW IS TRUSTWORTHY OR REPRODUCIBLE. THE CORE SCIENTIFIC CLAIMS ARE NO LONGER VALID.
  2. By default, the cache is not used, so
    1. You need to write code to store your own targets (in your targets' commands or config$hasty_build()), and
    2. knitr/rmarkdown reports with calls to loadd()/readd() will no longer work properly as pieces of the pipeline.

Advantages

  1. Hasty mode is a sandbox. By supplying a hasty_build function to your drake_config() object, you can experiment with different ways to process targets.
  2. There is no overhead from storing and checking targets, so hasty mode runs much faster than drake's standard modes.
  3. You still have scheduling and dependency management. drake still builds the correct targets in the correct order, waiting for dependencies to finish before advancing downstream.

Installation

library(remotes)
install_github("ropensci/drake")
install_github("wlandau/drake.hasty")

Basic usage

We begin with a drake project.

library(drake.hasty)
plan <- drake_plan(x = rnorm(100), y = mean(x), z = median(x))

plan

First, create a drake_config() object from your workflow.

config <- drake_config(plan)

You really only need the plan, schedule, and envir slots of config. Feel free to create them yourself.

config <- list(
  plan = config$plan,
  schedule = config$schedule,
  envir = config$envir
)

Then run the project.

hasty_make(config = config)

By default, there is no caching or checking in hasty mode, so your targets are never up to date.

hasty_make(config = config)

Parallel and distributed computing

If you have the clustermq package installed, you can use parallel and distributed computing.

# Use 2 persistent workers.
config$jobs <- 2

# See https://github.com/mschubert/clustermq for more options.
options(clustermq.scheduler = "multicore")

hasty_make(config = config)

Custom build functions

You can customize how each target gets built. By default, hasty_build_default() is used.

hasty_build_default

But there is another built-in function that also stores the targets to drake's cache.

hasty_build_store

To use it, simply add the build function and a storr cache to config and run hasty_make().

config$hasty_build <- hasty_build_store
config$cache <- storr::storr_rds(tempfile())
hasty_make(config = config)

Now you can read targets from the cache.

readd(z, cache = config$cache)

Similarly, you can write your own custom build functions for config$hasty_build.

drake::clean(destroy = TRUE)


wlandau/drake.hasty documentation built on May 23, 2019, 5:08 p.m.