drake_slice: Take a strategic subset of a dataset. [Stable]
In wlandau-lilly/drake: A Pipeline Toolkit for Reproducible Computation at Scale

drake_slice

R Documentation

Take a strategic subset of a dataset.

Description

drake_slice() is similar to split(). Both functions partition data into disjoint subsets, but whereas split() returns all the subsets, drake_slice() returns just one. In other words, drake_slice(..., index = i) returns split(...)[[i]]. Other features: 1. drake_slice() works on vectors, data frames, matrices, lists, and arbitrary arrays. 2. Like parallel::splitIndices(), drake_slice() tries to distribute the data uniformly across subsets. See the examples to learn why splitting is useful in drake.

Usage

drake_slice(data, slices, index, margin = 1L, drop = FALSE)

Arguments

`data`	A list, vector, data frame, matrix, or arbitrary array. Anything with a `length()` or `dim()`.
`slices`	Integer of length 1, number of slices (i.e. pieces) of the whole dataset. Remember, `drake_slice(index = i)` returns only slice number `i`.
`index`	Integer of length 1, which piece of the partition to return.
`margin`	Integer of length 1, margin over which to split the data. For example, for a data frame or matrix, use `margin = 1` to split over rows and `margin = 2` to split over columns. Similar to `MARGIN` in `apply()`.
`drop`	Logical, for matrices and arrays. If `TRUE`,`⁠ the result is coerced to the lowest possible dimension. See ?⁠`[' for details.

Value

A subset of data.

Examples

# Simple usage
x <- matrix(seq_len(20), nrow = 5)
x
drake_slice(x, slices = 3, index = 1)
drake_slice(x, slices = 3, index = 2)
drake_slice(x, slices = 3, index = 3)
drake_slice(x, slices = 3, margin = 2, index = 1)
# In drake, you can split a large dataset over multiple targets.
## Not run: 
isolate_example("contain side effects", {
plan <- drake_plan(
  large_data = mtcars,
  data_split = target(
    drake_slice(large_data, slices = 32, index = i),
    transform = map(i = !!seq_len(32))
  )
)
plan
cache <- storr::storr_environment()
make(plan, cache = cache, session_info = FALSE, verbose = FALSE)
readd(data_split_1L, cache = cache)
readd(data_split_2L, cache = cache)
})

## End(Not run)

wlandau-lilly/drake documentation built on Dec. 3, 2024, 11:09 p.m.