memApply: Analog of 'parApply' function for a shared memory context.

View source: R/memApply.R

memApplyR Documentation

Analog of parApply function for a shared memory context.

Description

memApply mirrors parApply in the shared memory setting given a shared memory space namespace with a target matrix X and some shared variables VARS either as variables or as names of their registered variables.

Usage

  memApply(X, MARGIN, FUN, 
  
  NAMESPACE = NULL, CLUSTER=NULL, VARS=NULL, MAX.CORES=NULL)

Arguments

X

A [1:n,1:d] numerical matrix of n rows and d columns which is worked upon. Can also be a string name of an already registered variable in NAMESPACE; otherwise will be registered automatically.

MARGIN

Whether to apply by row (1) or column (2).

FUN

Function that is applied on either the rows or columns of X. The first argument will be set to the vector and the subsequent arguments have to have the same name as their registered variables.

NAMESPACE

Optional, string. The namespace identifier for the shared memory session. If this is NULL it will be set to the name of FUN in runtime environment. However for inline-defined functions FUN an explicit NAMESPACE is recommended.

CLUSTER

Optional, A parallel::makeCluster cluster. Will be used for parallelization. By defining clusterExport constant R-copied objects (non-shared) can be shared among different executions of FUN. If NULL we initialize a new one.

VARS

Optional, Either a named list of variables where the name will be the name under which the variable is registered in shared memory space or a character vector of names of variables already registered which should be provided to FUN.

MAX.CORES

Optional, In case CLUSTER is undefined a new cluster with MAX.CORES many cores will be initialized. If NULL we use detectCores() - 1 many.

Details

memApply runs a worker pool on the exact same memory (for shared memory context, see registerVariables), and allows you to apply a function FUN row- or columnwise (depending on MARGIN) over the target matrix. Since the memory is shared only the names of variables have to be copied to each worker thread in CLUSTER (a makeCluster multithreading cluster) resulting in sharing of arbitrarily large matrices (as long as the fit in RAM once) along a parallel cluster while only copying a couple of bytes per cluster.

The numerical matrix X and the Vars havee to be objects of base type 'double'.

It is recommended not to change the values of v inside FUN, however this will only lead to some copying of the column whenever it is worked upon; the shared memory thus will not be corrupted even if you write to column or row. Also the copying only ever happens for one column/row at a time leading to much lower memory consumption than parallel even in this case.

Thread safety

The vector v passed to FUN is typically an ALTREP view that directly references shared memory rather than a private copy. This means that multiple worker processes may be reading the same memory region simultaneously.

Read-only operations are fully safe and recommended. Examples include statistical summaries (mean(v), cor(v, y)), vectorized arithmetic, and model-fitting that does not modify v.

If you attempt to modify elements of v directly (for example, v[1] <- 0), you are writing into a shared buffer. Concurrent modification by multiple workers can lead to race conditions or data corruption. Even if no other process is writing, in-place assignment may still trigger an internal copy of that row or column, slightly increasing memory usage.

For safety and clarity, always copy v locally if you need to modify it:

f <- function(v, y) {
  v <- as.vector(v)  # make a private, normal R copy
  v <- scale(v)
  cor(v, y)
}

This ensures isolation between workers and prevents unintended data sharing.

Finally, remember that R's internal C API is not thread-safe. If your function FUN uses multi-threaded C++ code (e.g., via OpenMP or TBB), those internal threads must not make calls into R (such as creating objects, evaluating expressions, or printing). All R interactions must occur in the main thread of each worker process.

Value

result

A list of the results of func(row,...) of size n or func(col, ...) of size d, depending on MARGIN, for every row/col of X.

Author(s)

Julian Maerte

See Also

parApply

Examples

  library(parallel)
  cl = makeCluster(1)
  i = 1
  A1 = matrix(as.double(1:10^(i+1)),10^i, 10^i)
  
  res = memApply(X = A1, MARGIN = 2, FUN = function(x) {
    return(sd(x))
  }, CLUSTER=cl, NAMESPACE="ns_apply")
  
  SD_vector=unlist(res)

memshare documentation built on Dec. 5, 2025, 9:07 a.m.