transformations: Transformations in 'drake_plan()'. *[Stable]*

transformationsR Documentation

Transformations in drake_plan(). [Stable]

Description

In drake_plan(), you can define whole batches of targets with transformations such as map(), split(), cross(), and combine().

Arguments

...

Grouping variables. New grouping variables must be supplied with their names and values, existing grouping variables can be given as symbols without any values assigned. For dynamic branching, the entries in ... must be unnamed symbols with no values supplied, and they must be the names of targets.

.data

A data frame of new grouping variables with grouping variable names as column names and values as elements.

.names

Literal character vector of names for the targets. Must be the same length as the targets generated.

.id

Symbol or vector of symbols naming grouping variables to incorporate into target names. Useful for creating short target names. Set .id = FALSE to use integer indices as target name suffixes.

.tag_in

A symbol or vector of symbols. Tags assign targets to grouping variables. Use .tag_in to assign untransformed targets to grouping variables.

.tag_out

Just like .tag_in, except that .tag_out assigns transformed targets to grouping variables.

slice

Number of slices into which split() partitions the data.

margin

Which margin to take the slices in split(). Same meaning as the MARGIN argument of apply().

drop

Logical, whether to drop a dimension if its length is 1. Same meaning as mtcars[, 1L, drop = TRUE] versus mtcars[, 1L, drop = TRUE].

.by

Symbol or vector of symbols of grouping variables. combine() aggregates/groups targets by the grouping variables in .by. For dynamic branching, .by can only take one variable at a time, and that variable must be a vector. Ideally, it should take little space in memory.

.trace

Symbol or vector of symbols for the dynamic trace. The dynamic trace allows you to keep track of the values of dynamic dependencies are associated with individual sub-targets. For combine(), .trace must either be empty or the same as the variable given for .by. See get_trace() and read_trace() for examples and other details.

Details

For details, see ⁠https://books.ropensci.org/drake/plans.html#large-plans⁠.

Transformations

drake has special syntax for generating large plans. Your code will look something like ⁠drake_plan(y = target(f(x), transform = map(x = c(1, 2, 3)))⁠ You can read about this interface at ⁠https://books.ropensci.org/drake/plans.html#large-plans⁠. # nolint

Static branching

In static branching, you define batches of targets based on information you know in advance. Overall usage looks like ⁠drake_plan(<x> = target(<...>, transform = <call>)⁠, where

  • ⁠<x>⁠ is the name of the target or group of targets.

  • ⁠<...>⁠ is optional arguments to target().

  • ⁠<call>⁠ is a call to one of the transformation functions.

Transformation function usage:

  • map(..., .data, .names, .id, .tag_in, .tag_out)

  • split(..., slices, margin = 1L, drop = FALSE, .names, .tag_in, .tag_out) # nolint

  • cross(..., .data, .names, .id, .tag_in, .tag_out)

  • combine(..., .by, .names, .id, .tag_in, .tag_out)

Dynamic branching

  • map(..., .trace)

  • cross(..., .trace)

  • group(..., .by, .trace)

map() and cross() create dynamic sub-targets from the variables supplied to the dots. As with static branching, the variables supplied to map() must all have equal length. group(f(data), .by = x) makes new dynamic sub-targets from data. Here, data can be either static or dynamic. If data is dynamic, group() aggregates existing sub-targets. If data is static, group() splits data into multiple subsets based on the groupings from .by.

Differences from static branching:

  • ... must contain unnamed symbols with no values supplied, and they must be the names of targets.

  • Arguments .id, .tag_in, and .tag_out no longer apply.

Examples

# Static branching
models <- c("glm", "hierarchical")
plan <- drake_plan(
  data = target(
    get_data(x),
    transform = map(x = c("simulated", "survey"))
  ),
  analysis = target(
    analyze_data(data, model),
    transform = cross(data, model = !!models, .id = c(x, model))
  ),
  summary = target(
    summarize_analysis(analysis),
    transform = map(analysis, .id = c(x, model))
  ),
  results = target(
    bind_rows(summary),
    transform = combine(summary, .by = data)
  )
)
plan
if (requireNamespace("styler")) {
  print(drake_plan_source(plan))
}
# Static splitting
plan <- drake_plan(
  analysis = target(
    analyze(data),
    transform = split(data, slices = 3L, margin = 1L, drop = FALSE)
  )
)
print(plan)
if (requireNamespace("styler", quietly = TRUE)) {
  print(drake_plan_source(plan))
}
# Static tags:
drake_plan(
  x = target(
    command,
    transform = map(y = c(1, 2), .tag_in = from, .tag_out = c(to, out))
  ),
  trace = TRUE
)
plan <- drake_plan(
  survey = target(
    survey_data(x),
    transform = map(x = c(1, 2), .tag_in = source, .tag_out = dataset)
  ),
  download = target(
    download_data(),
    transform = map(y = c(5, 6), .tag_in = source, .tag_out = dataset)
  ),
  analysis = target(
    analyze(dataset),
    transform = map(dataset)
  ),
  results = target(
    bind_rows(analysis),
    transform = combine(analysis, .by = source)
  )
)
plan
if (requireNamespace("styler", quietly = TRUE)) {
  print(drake_plan_source(plan))
}

wlandau-lilly/drake documentation built on Dec. 3, 2024, 11:09 p.m.