chunkApply | R Documentation |
Perform equivalents of apply
, lapply
, and mapply
, but over parallelized chunks of data. This is most useful if accessing the data is potentially time-consuming, such as for file-based matter
objects. Operating on chunks reduces the number of I/O operations.
## Operate on elements/rows/columns
chunkApply(X, MARGIN, FUN, ...,
simplify = FALSE, outpath = NULL,
verbose = NA, BPPARAM = bpparam())
chunkLapply(X, FUN, ...,
simplify = FALSE, outpath = NULL,
verbose = NA, BPPARAM = bpparam())
chunkMapply(FUN, ...,
simplify = FALSE, outpath = NULL,
verbose = NA, BPPARAM = bpparam())
## Operate on complete chunks
chunk_rowapply(X, FUN, ...,
simplify = "c", nchunks = NA, depends = NULL,
seeds = NULL, verbose = NA, BPPARAM = bpparam())
chunk_colapply(X, FUN, ...,
simplify = "c", nchunks = NA, depends = NULL,
seeds = NULL, verbose = NA, BPPARAM = bpparam())
chunk_lapply(X, FUN, ...,
simplify = "c", nchunks = NA, depends = NULL,
seeds = NULL, verbose = NA, BPPARAM = bpparam())
chunk_mapply(FUN, ..., MoreArgs = NULL,
simplify = "c", nchunks = NA, depends = NULL,
seeds = NULL, verbose = NA, BPPARAM = bpparam())
X |
A matrix for |
MARGIN |
If the object is matrix-like, which dimension to iterate over. Must be 1 or 2, where 1 indicates rows and 2 indicates columns. The dimension names can also be used if |
FUN |
The function to be applied. |
MoreArgs |
A list of other arguments to |
... |
Additional arguments to be passed to |
simplify |
Should the result be simplified into a vector, matrix, or higher dimensional array? |
nchunks |
The number of chunks to use. If |
depends |
A list with length equal to the extent of |
seeds |
A list of RNG seeds such such as those returned by |
outpath |
If non-NULL, a file path where the results should be written as they are processed. If specified, |
verbose |
Should user messages be printed with the current chunk being processed? If |
BPPARAM |
An optional instance of |
For chunkApply()
, chunkLapply()
, and chunkMapply()
:
For vectors and lists, the vector is broken into some number of chunks according to chunks
. The individual elements of the chunk are then passed to FUN
.
For matrices, the matrix is chunked along rows or columns, based on the number of chunks
. The individual rows or columns of the chunk are then passed to FUN
.
In this way, the first argument of FUN
is analogous to using the base apply
, lapply
, and mapply
functions.
For chunk_rowapply()
, chunk_colapply()
, chunk_lapply()
, and chunk_mapply()
:
In this situation, the entire chunk is passed to FUN
, and FUN
is responsible for knowing how to handle a sub-vector or sub-matrix of the original object. This may be useful if FUN
is already a function that could be applied to the whole object such as rowSums
or colSums
.
When this is the case, it may be useful to provide a custom simplify
function.
For convenience to the programmer, several attributes are made available when operating on a chunk.
"chunkid": The index of the chunk currently being processed by FUN
.
"index": The indices of the elements of the chunk, as elements/rows/columns in the original matrix/vector.
"depends" (optional): If depends
is given, then this is a list of indices within the chunk. The length of the list is equal to the number of elements/rows/columns in the chunk. Each list element either NULL
or a vector of indices giving the elements/rows/columns of the chunk that should be processed for that index. The indices that should be processed will be non-NULL
, and indices that should be ignored will be NULL
.
The depends
argument can be used to iterate over dependent elements of a vector, or dependent rows/columns of a matrix. This can be useful if the calculation for a particular row/column/element depends on the values of others.
When depends
is provided, multiple rows/columns/elements will be passed to FUN
. Each element of the depends
list should be a vector giving the indices that should be passed to FUN
.
For example, this can be used to implement a rolling apply function.
Typically, a list if simplify=FALSE
. Otherwise, the results may be coerced to a vector or array.
Kylie A. Bemis
apply
,
lapply
,
mapply
,
RNGkind
,
RNGStreams
register(SerialParam())
set.seed(1)
x <- matrix(rnorm(1000^2), nrow=1000, ncol=1000)
out <- chunkApply(x, 1L, mean, nchunks=10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.