drake_plan | R Documentation |
plan
argument of make()
.
A drake
plan is a data frame with columns
"target"
and "command"
. Each target is an R object
produced in your workflow, and each command is the
R code to produce it.
drake_plan(
...,
list = NULL,
file_targets = NULL,
strings_in_dots = NULL,
tidy_evaluation = NULL,
transform = TRUE,
trace = FALSE,
envir = parent.frame(),
tidy_eval = TRUE,
max_expand = NULL
)
... |
A collection of symbols/targets with commands assigned to them. See the examples for details. |
list |
Deprecated |
file_targets |
Deprecated. |
strings_in_dots |
Deprecated. |
tidy_evaluation |
Deprecated. Use |
transform |
Logical, whether to transform the plan
into a larger plan with more targets.
Requires the |
trace |
Logical, whether to add columns to show what happens during target transformations. |
envir |
Environment for tidy evaluation. |
tidy_eval |
Logical, whether to use tidy evaluation
(e.g. unquoting/ |
max_expand |
Positive integer, optional.
|
Besides "target"
and "command"
, drake_plan()
understands a special set of optional columns. For details, visit
https://books.ropensci.org/drake/plans.html#special-custom-columns-in-your-plan
# nolint
A data frame of targets, commands, and optional custom columns.
drake_plan()
creates a special data frame. At minimum, that data frame
must have columns target
and command
with the target names and the
R code chunks to build them, respectively.
You can add custom columns yourself, either with target()
(e.g.
drake_plan(y = target(f(x), transform = map(c(1, 2)), format = "fst"))
)
or by appending columns post-hoc (e.g. plan$col <- vals
).
Some of these custom columns are special. They are optional,
but drake
looks for them at various points in the workflow.
transform
: a call to map()
, split()
, cross()
, or
combine()
to create and manipulate large collections of targets.
Details: (https://books.ropensci.org/drake/plans.html#large-plans
). # nolint
format
: set a storage format to save big targets more efficiently.
See the "Formats" section of this help file for more details.
trigger
: rule to decide whether a target needs to run.
It is recommended that you define this one with target()
.
Details: https://books.ropensci.org/drake/triggers.html
.
hpc
: logical values (TRUE
/FALSE
/NA
) whether to send each target
to parallel workers.
Visit https://books.ropensci.org/drake/hpc.html#selectivity
to learn more.
resources
: target-specific lists of resources for a computing cluster.
See
https://books.ropensci.org/drake/hpc.html#advanced-options
for details.
caching
: overrides the caching
argument of make()
for each target
individually. Possible values:
"main": tell the main process to store the target in the cache.
"worker": tell the HPC worker to store the target in the cache.
NA: default to the caching
argument of make()
.
elapsed
and cpu
: number of seconds to wait for the target to build
before timing out (elapsed
for elapsed time and cpu
for CPU time).
retries
: number of times to retry building a target
in the event of an error.
seed
: an optional pseudo-random number generator (RNG)
seed for each target. drake
usually comes up with its own
unique reproducible target-specific seeds using the global seed
(the seed
argument to make()
and drake_config()
)
and the target names, but you can overwrite these automatic seeds.
NA
entries default back to drake
's automatic seeds.
max_expand
: for dynamic branching only. Same as the max_expand
argument of make()
, but on a target-by-target basis.
Limits the number of sub-targets created for a given target.
Specialized target formats increase efficiency and flexibility.
Some allow you to save specialized objects like keras
models,
while others increase the speed while conserving storage and memory.
You can declare target-specific formats in the plan
(e.g. drake_plan(x = target(big_data_frame, format = "fst"))
)
or supply a global default format
for all targets in make()
.
Either way, most formats have specialized installation requirements
(e.g. R packages) that are not installed with drake
by default.
You will need to install them separately yourself.
Available formats:
"file"
: Dynamic files. To use this format, simply create
local files and directories yourself and then return
a character vector of paths as the target's value.
Then, drake
will watch for changes to those files in
subsequent calls to make()
. This is a more flexible
alternative to file_in()
and file_out()
, and it is
compatible with dynamic branching.
See https://github.com/ropensci/drake/pull/1178
for an example.
"fst"
: save big data frames fast. Requires the fst
package.
Note: this format strips non-data-frame attributes such as the
"fst_tbl"
: Like "fst"
, but for tibble
objects.
Requires the fst
and tibble
packages.
Strips away non-data-frame non-tibble attributes.
"fst_dt"
: Like "fst"
format, but for data.table
objects.
Requires the fst
and data.table
packages.
Strips away non-data-frame non-data-table attributes.
"diskframe"
:
Stores disk.frame
objects, which could potentially be
larger than memory. Requires the fst
and disk.frame
packages.
Coerces objects to disk.frame
s.
Note: disk.frame
objects get moved to the drake
cache
(a subfolder of .drake/
for most workflows).
To ensure this data transfer is fast, it is best to
save your disk.frame
objects to the same physical storage
drive as the drake
cache,
as.disk.frame(your_dataset, outdir = drake_tempfile())
.
"keras"
: save Keras models as HDF5 files.
Requires the keras
package.
"qs"
: save any R object that can be properly serialized
with the qs
package. Requires the qs
package.
Uses qsave()
and qread()
.
Uses the default settings in qs
version 0.20.2.
"rds"
: save any R object that can be properly serialized.
Requires R version >= 3.5.0 due to ALTREP.
Note: the "rds"
format uses gzip compression, which is slow.
"qs"
is a superior format.
drake_plan()
understands special keyword functions for your commands.
With the exception of target()
, each one is a proper function
with its own help file.
target()
: give the target more than just a command.
Using target()
, you can apply a transformation
(examples: https://books.ropensci.org/drake/plans.html#large-plans
), # nolint
supply a trigger (https://books.ropensci.org/drake/triggers.html
), # nolint
or set any number of custom columns.
file_in()
: declare an input file dependency.
file_out()
: declare an output file to be produced
when the target is built.
knitr_in()
: declare a knitr
file dependency such as an
R Markdown (*.Rmd
) or R LaTeX (*.Rnw
) file.
ignore()
: force drake
to entirely ignore a piece of code:
do not track it for changes and do not analyze it for dependencies.
no_deps()
: tell drake
to not track the dependencies
of a piece of code. drake
still tracks the code itself for changes.
id_chr()
: Get the name of the current target.
drake_envir()
: get the environment where drake builds targets.
Intended for advanced custom memory management.
drake
has special syntax for generating large plans.
Your code will look something like
drake_plan(y = target(f(x), transform = map(x = c(1, 2, 3)))
You can read about this interface at
https://books.ropensci.org/drake/plans.html#large-plans
. # nolint
In static branching, you define batches of targets
based on information you know in advance.
Overall usage looks like
drake_plan(<x> = target(<...>, transform = <call>)
,
where
<x>
is the name of the target or group of targets.
<...>
is optional arguments to target()
.
<call>
is a call to one of the transformation functions.
Transformation function usage:
map(..., .data, .names, .id, .tag_in, .tag_out)
split(..., slices, margin = 1L, drop = FALSE, .names, .tag_in, .tag_out)
# nolint
cross(..., .data, .names, .id, .tag_in, .tag_out)
combine(..., .by, .names, .id, .tag_in, .tag_out)
map(..., .trace)
cross(..., .trace)
group(..., .by, .trace)
map()
and cross()
create dynamic sub-targets from the variables
supplied to the dots. As with static branching, the variables
supplied to map()
must all have equal length.
group(f(data), .by = x)
makes new dynamic
sub-targets from data
. Here, data
can be either static or dynamic.
If data
is dynamic, group()
aggregates existing sub-targets.
If data
is static, group()
splits data
into multiple
subsets based on the groupings from .by
.
Differences from static branching:
...
must contain unnamed symbols with no values supplied,
and they must be the names of targets.
Arguments .id
, .tag_in
, and .tag_out
no longer apply.
make, drake_config, transform_plan, map, split, cross, combine
## Not run:
isolate_example("contain side effects", {
# For more examples, visit
# https://books.ropensci.org/drake/plans.html.
# Create drake plans:
mtcars_plan <- drake_plan(
write.csv(mtcars[, c("mpg", "cyl")], file_out("mtcars.csv")),
value = read.csv(file_in("mtcars.csv"))
)
if (requireNamespace("visNetwork", quietly = TRUE)) {
plot(mtcars_plan) # fast simplified call to vis_drake_graph()
}
mtcars_plan
make(mtcars_plan) # Makes `mtcars.csv` and then `value`
head(readd(value))
# You can use knitr inputs too. See the top command below.
load_mtcars_example()
head(my_plan)
if (requireNamespace("knitr", quietly = TRUE)) {
plot(my_plan)
}
# The `knitr_in("report.Rmd")` tells `drake` to dive into the active
# code chunks to find dependencies.
# There, `drake` sees that `small`, `large`, and `coef_regression2_small`
# are loaded in with calls to `loadd()` and `readd()`.
deps_code("report.Rmd")
# Formats are great for big data: https://github.com/ropensci/drake/pull/977
# Below, each target is 1.6 GB in memory.
# Run make() on this plan to see how much faster fst is!
n <- 1e8
plan <- drake_plan(
data_fst = target(
data.frame(x = runif(n), y = runif(n)),
format = "fst"
),
data_old = data.frame(x = runif(n), y = runif(n))
)
# Use transformations to generate large plans.
# Read more at
# `https://books.ropensci.org/drake/plans.html#create-large-plans-the-easy-way`. # nolint
drake_plan(
data = target(
simulate(nrows),
transform = map(nrows = c(48, 64)),
custom_column = 123
),
reg = target(
reg_fun(data),
transform = cross(reg_fun = c(reg1, reg2), data)
),
summ = target(
sum_fun(data, reg),
transform = cross(sum_fun = c(coef, residuals), reg)
),
winners = target(
min(summ),
transform = combine(summ, .by = c(data, sum_fun))
)
)
# Split data among multiple targets.
drake_plan(
large_data = get_data(),
slice_analysis = target(
analyze(large_data),
transform = split(large_data, slices = 4)
),
results = target(
rbind(slice_analysis),
transform = combine(slice_analysis)
)
)
# Set trace = TRUE to show what happened during the transformation process.
drake_plan(
data = target(
simulate(nrows),
transform = map(nrows = c(48, 64)),
custom_column = 123
),
reg = target(
reg_fun(data),
transform = cross(reg_fun = c(reg1, reg2), data)
),
summ = target(
sum_fun(data, reg),
transform = cross(sum_fun = c(coef, residuals), reg)
),
winners = target(
min(summ),
transform = combine(summ, .by = c(data, sum_fun))
),
trace = TRUE
)
# You can create your own custom columns too.
# See ?triggers for more on triggers.
drake_plan(
website_data = target(
command = download_data("www.your_url.com"),
trigger = "always",
custom_column = 5
),
analysis = analyze(website_data)
)
# Tidy evaluation can help generate super large plans.
sms <- rlang::syms(letters) # To sub in character args, skip this.
drake_plan(x = target(f(char), transform = map(char = !!sms)))
# Dynamic branching
# Get the mean mpg for each cyl in the mtcars dataset.
plan <- drake_plan(
raw = mtcars,
group_index = raw$cyl,
munged = target(raw[, c("mpg", "cyl")], dynamic = map(raw)),
mean_mpg_by_cyl = target(
data.frame(mpg = mean(munged$mpg), cyl = munged$cyl[1]),
dynamic = group(munged, .by = group_index)
)
)
make(plan)
readd(mean_mpg_by_cyl)
})
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.