| memApply | R Documentation |
parApply function for a shared memory context. memApply mirrors parApply in the shared memory setting given a shared memory space namespace with a target matrix X and some shared variables VARS either as variables or as names of their registered variables.
memApply(X, MARGIN, FUN,
NAMESPACE = NULL, CLUSTER=NULL, VARS=NULL, MAX.CORES=NULL)
X |
A [1:n,1:d] numerical matrix of n rows and d columns which is worked upon. Can also be a string name of an already registered variable in |
MARGIN |
Whether to apply by row (1) or column (2). |
FUN |
Function that is applied on either the rows or columns of |
NAMESPACE |
Optional, string. The namespace identifier for the shared memory session. If this is |
CLUSTER |
Optional, A parallel::makeCluster cluster. Will be used for parallelization. By defining clusterExport constant R-copied objects (non-shared) can be shared among different executions of FUN. If |
VARS |
Optional, Either a named list of variables where the name will be the name under which the variable is registered in shared memory space or a character vector of names of variables already registered which should be provided to FUN. |
MAX.CORES |
Optional, In case CLUSTER is undefined a new cluster with |
memApply runs a worker pool on the exact same memory (for shared memory context, see registerVariables), and allows you to apply a function FUN row- or columnwise (depending on MARGIN) over the target matrix.
Since the memory is shared only the names of variables have to be copied to each worker thread in CLUSTER (a makeCluster multithreading cluster) resulting in sharing of arbitrarily large matrices (as long as the fit in RAM once) along a parallel cluster while only copying a couple of bytes per cluster.
The numerical matrix X and the Vars havee to be objects of base type 'double'.
It is recommended not to change the values of v inside FUN, however this will only lead to some copying of the column whenever it is worked upon; the shared memory thus will not be corrupted even if you write to column or row. Also the copying only ever happens for one column/row at a time leading to much lower memory consumption than parallel even in this case.
Thread safety
The vector v passed to FUN is typically an ALTREP view that
directly references shared memory rather than a private copy.
This means that multiple worker processes may be reading the same memory region
simultaneously.
Read-only operations are fully safe and recommended. Examples include
statistical summaries (mean(v), cor(v, y)), vectorized arithmetic,
and model-fitting that does not modify v.
If you attempt to modify elements of v directly (for example,
v[1] <- 0), you are writing into a shared buffer.
Concurrent modification by multiple workers can lead to race conditions
or data corruption. Even if no other process is writing, in-place assignment
may still trigger an internal copy of that row or column, slightly increasing
memory usage.
For safety and clarity, always copy v locally if you need to modify it:
f <- function(v, y) {
v <- as.vector(v) # make a private, normal R copy
v <- scale(v)
cor(v, y)
}
This ensures isolation between workers and prevents unintended data sharing.
Finally, remember that R's internal C API is not thread-safe.
If your function FUN uses multi-threaded C++ code (e.g., via OpenMP or TBB),
those internal threads must not make calls into R (such as creating
objects, evaluating expressions, or printing).
All R interactions must occur in the main thread of each worker process.
result |
A list of the results of func(row,...) of size n or func(col, ...) of size d, depending on |
Julian Maerte
parApply
library(parallel)
cl = makeCluster(1)
i = 1
A1 = matrix(as.double(1:10^(i+1)),10^i, 10^i)
res = memApply(X = A1, MARGIN = 2, FUN = function(x) {
return(sd(x))
}, CLUSTER=cl, NAMESPACE="ns_apply")
SD_vector=unlist(res)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.