chunk_apply: Apply Functions Over Chunks of a List, Vector, or Matrix

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/apply.R

Description

Perform equivalents of apply, lapply, and mapply, but over parallelized chunks of the data. This is most useful if accessing the data is potentially time-consuming, such as for file-based matter objects. Operating on chunks reduces the number of I/O operations.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
chunk_apply(X, FUN, MARGIN, ..., simplify = FALSE,
    chunks = NA, view = c("element", "chunk"),
    attr = list(), alist = list(), pattern = NULL,
    outfile = NULL, verbose = FALSE,
    BPREDO = list(), BPPARAM = bpparam())

chunk_mapply(FUN, ..., MoreArgs = NULL, simplify = FALSE,
    chunks = NA, view = c("element", "chunk"),
    attr = list(), alist = list(), pattern = NULL,
    outfile = NULL, verbose = FALSE,
    BPREDO = list(), BPPARAM = bpparam())

Arguments

X

A list, vector, or matrix for chunk_apply(). These may be any class that implements suitable methods for [, [[, dim, and length(). Only lists are supported for chunk_mapply().

FUN

The function to be applied.

MARGIN

If the object is matrix-like, which dimension to iterate over. Must be 1 or 2, where 1 indicates rows and 2 indicates columns. The dimension names can also be used if X has dimnames set.

MoreArgs

A list of other arguments to FUN.

...

Additional arguments to be passed to FUN.

simplify

Should the result be simplified into a vector, matrix, or higher dimensional array?

chunks

The number of chunks to use. If NA (the default), this is inferred from chunksize(X) for matter objects, or from getOption("matter.default.chunksize") for non-matter classes. For IO-bound operations, using fewer chunks will often be faster, but use more memory.

view

What should be passed as the argment to FUN: "element" means the vector element, row, or column are passed (same as the behavior of lapply and apply), and "chunk" means to pass the entire chunk.

attr

A named list of attributes that will be attached to the argument passed to FUN as-is.

alist

A named list of vector-like attributes that will be attached to the argument passed to FUN, subsetted to the current elements. Typically, each attribute should be as long as X, unless pattern is specified, in which case each attribute should be as long as pattern.

pattern

A list of indices giving a pattern over which to apply FUN to X. Each element of pattern should give a vector of indices which can be used subscript X. For time and space efficiency, no attempt is made to verify these indices are valid.

outfile

If non-NULL, a file path where the results should be written as they are processed. If specified, FUN must return a 'raw', 'logical', 'integer', or 'numeric' vector. The result will be returned as a matter object.

verbose

Should user messages be printed with the current chunk being processed?

BPREDO

See documentation for bplapply.

BPPARAM

An optional instance of BiocParallelParam. See documentation for bplapply.

Details

When view = "element":

For vectors and lists, the vector is broken into some number of chunks according to chunks. The individual elements of the chunk are then passed to FUN.

For matrices, the matrix is chunked along rows or columns, based on the number of chunks. The individual rows or columns of the chunk are then passed to FUN.

In this way, the first argument of FUN is analogous to using the base apply and lapply functions.

However, when view = "chunk":

In this situation, the entire chunk is passed to FUN, and FUN is responsible for knowing how to handle a sub-vector or sub-matrix of the original object. This may be useful if FUN is already a function that could be applied to the whole object such as rowSums or colSums.

When this is the case, it may be useful to provide a custom simplify function. Otherwise, the result will be returned as a list with length equal to the number of chunks, which must be post-processed to get into a desirable form.

For convenience to the programmer, several attributes are made available when view = "chunk".

The pattern argument can be used to iterate over dependent elements of a vector, or dependent rows/columns of a matrix. This can be useful if the calculation for a particular row/column/element depends on the values of others.

When pattern is provided, multiple rows/columns/elements will be passed to FUN, even when view="element". Each element of the pattern list should be a vector giving the indices that should be passed to FUN.

This can be used to implement a rolling apply function.

Value

Typically, a list if simplify=FALSE. Otherwise, the results may be coerced to a vector or array.

Author(s)

Kylie A. Bemis

See Also

apply, lapply, mapply,

Examples

1
2
3
4
5
6
register(SerialParam())

set.seed(1)
x <- matrix(rnorm(1000^2), nrow=1000, ncol=1000)

out <- chunk_apply(x, mean, 1, chunks=20, verbose=TRUE)

matter documentation built on Nov. 8, 2020, 6:15 p.m.