mclapply | R Documentation |
This wrapper for parallel::mclapply
adds the
following features:
reliably detect if a child process failed with a fatal error or if it was killed.
get tracebacks after non-fatal errors in child processes.
retry on fatal and non-fatal errors.
fail early after non-fatal errors in child processes.
get crash dumps from failed child processes.
capture output from child processes.
track warnings, messages and other conditions signaled in the child processes.
return results from child processes using POSIX shared memory to improve performance.
compress character vectors in results to improve performance.
reproducibly seed all function calls.
display a progress bar.
mclapply(
X,
FUN,
...,
mc.preschedule = TRUE,
mc.set.seed = NA,
mc.silent = FALSE,
mc.cores = getOption("mc.cores", 2L),
mc.cleanup = TRUE,
mc.allow.recursive = TRUE,
affinity.list = NULL,
mc.use.names = TRUE,
mc.allow.fatal = FALSE,
mc.allow.error = FALSE,
mc.retry = 0L,
mc.retry.silent = FALSE,
mc.retry.fixed.seed = FALSE,
mc.fail.early = isFALSE(mc.allow.error) && mc.retry == 0L,
mc.dump.frames = c("partial", "full", "full_global", "no"),
mc.dumpto = ifelse(interactive(), "last.dump", "file://last.dump.rds"),
mc.stdout = c("capture", "output", "ignore"),
mc.warnings = c("m_signal", "signal", "m_output", "output", "m_ignore", "ignore",
"stop"),
mc.messages = c("m_signal", "signal", "m_output", "output", "m_ignore", "ignore"),
mc.conditions = c("signal", "ignore"),
mc.system.time = FALSE,
mc.compress.chars = TRUE,
mc.compress.altreps = c("if_allocated", "yes", "no"),
mc.share.vectors = getOption("bettermc.use_shm", TRUE),
mc.share.altreps = c("no", "yes", "if_allocated"),
mc.share.copy = TRUE,
mc.shm.ipc = getOption("bettermc.use_shm", TRUE),
mc.force.fork = FALSE,
mc.progress = interactive()
)
crash_dumps # environment with crash dumps created by mclapply (cf. mc.dumpto)
X |
a vector (atomic or list) or an expressions vector. Other
objects (including classed objects) will be coerced by
|
FUN |
the function to be applied to ( |
... |
For |
mc.preschedule |
if set to |
mc.set.seed |
In both ( |
mc.silent |
if set to |
mc.cores |
The number of cores to use, i.e. at most how many child processes will be run simultaneously. The option is initialized from environment variable MC_CORES if set. Must be at least one, and parallelization requires at least two cores. |
mc.cleanup |
if set to |
mc.allow.recursive |
Unless true, calling |
affinity.list |
a vector (atomic or list) containing the CPU
affinity mask for each element of |
mc.use.names |
if |
mc.allow.fatal |
should fatal errors in child processes make
|
mc.allow.error |
should non-fatal errors in |
mc.retry |
The environment variable "BMC_RETRY" indicates the current retry. A value of "0" means first try, a value of "1" first retry, etc. |
mc.retry.silent |
should the messages indicating both fatal and
non-fatal failures during all but the last retry be suppressed
( |
mc.retry.fixed.seed |
should |
mc.fail.early |
should we try to fail fast after encountering the first
(non-fatal) error in |
mc.dump.frames |
should we |
mc.dumpto |
where to save the result including the dumped frames if
|
mc.stdout |
how should standard output from |
mc.warnings, mc.messages, mc.conditions |
how should warnings, messages
and other conditions signaled by |
mc.system.time |
should |
mc.compress.chars |
should character vectors be compressed using
|
mc.compress.altreps |
should a character vector be compressed if it is an ALTREP? The default "if_allocated" only does so if the regular representation was already created. This was chosen as the default because in this case is is the regular representation which would be serialized. |
mc.share.vectors |
should non-character |
mc.share.altreps |
should a non-character vector be returned from the child process using POSIX shared memory if it is an ALTREP? |
mc.share.copy |
should the parent process use a vector placed in shared
memory due to |
mc.shm.ipc |
should the results be returned from the child processes
using POSIX shared memory (cf. |
mc.force.fork |
should it be ensured that |
mc.progress |
should a progress bar be printed to stderr of the parent
process (package |
crash_dumps
is an initially empty environment used to store
the return values of mclapply
(see below) including
crash dumps in case of non-fatal errors and if
mc.dump.frames != "no" & mc.allow.error == FALSE
.
mclapply
returns a list of the same length as X and named by
X. In case of fatal/non-fatal errors and depending on
mc.allow.fatal
/mc.allow.error
/mc.fail.early
, some of
the elements might inherit from
"fatal-error"/"etry-error"/"fail-early-error" and "try-error"
or be NULL
.
The shared memory objects created by
mclapply
are named as follows (this may be subject to change):
/bmc_ppid_timestamp_idx_cntr
(e.g.
/bmc_21479_1601366973201_16_10
), with
the process id of the parent process.
the time at which
mclapply
was invoked (in milliseconds since epoch; on macOS: seconds
since epoch, due to its 31-character limit w.r.t. POSIX
names).
the index of the current element of X
(1-based).
an internal counter (1-based) referring to all the
objects created due to mc.share.vectors
for the current value of
X
; a value of 0
is used for the object created due to
mc.shm.ipc
.
bettermc::mclapply
does not err if copying data to shared memory
fails. It will rather only print a message and return results the usual
way.
POSIX shared memory has (at least) kernel persistence, i.e. it is not
automatically freed due to process termination, except if the object is/was
unlinked. bettermc
tries hard to not leave any byte behind, but it
could happen that unlinking is incomplete if the parent process is
terminated while bettermc::mclapply
is running.
On Linux you can generally inspect the (not-unlinked) objects currently stored in shared memory by listing the files under /dev/shm.
On Linux, POSIX shared memory
is implemented using a
tmpfs
typically mounted under /dev/shm
. If not changed by the
distribution, the default size of it is 50% of physical RAM. It can be
changed (temporarily) by remounting it with a different value for the
size option, e.g. mount -o "remount,size=90%" /dev/shm
.
When
allocating a shared memory object of at least
getOption("bettermc.hugepage_limit", 104857600)
bytes of size
(default is 100 MiB), we use
madvise
(...,
MADV_HUGEPAGE)
to request the allocation of
(transparent)
huge pages. For this to have any effect, the
tmpfs
used to implement POSIX shared memory on Linux (typically mounted under
/dev/shm
) must be (re)mounted with option huge=advise, i.e.
mount -o remount,huge=advise /dev/shm
. (The default is
huge=never
, but this might be distribution-specific.)
On Windows, otherwise valid values for various arguments are silently replaced as follows:
mc.cores <- 1L mc.share.vectors <- Inf mc.shm.ipc <- FALSE mc.force.fork <- FALSE mc.progress <- FALSE if (mc.stdout == "output") mc.stdout <- "ignore" if (mc.warnings == "output") mc.warnings <- "ignore" if (mc.messages == "output") mc.messages <- "ignore"
Note: parallel::mclapply
demands
mc.cores
to be exactly 1 on Windows; bettermc::mclapply
sets
it to 1 on Windows.
Furthermore, parallel::mclapply
ignores
the following arguments on Windows: mc.preschedule, mc.silent,
mc.cleanup, mc.allow.recursive, affinity.list
. For mc.set.seed
,
only the values TRUE
and FALSE
are ignored (by
parallel::mclapply
); the other values are
handled by bettermc::mclapply
as documented above.
copy2shm
, char_map
,
parallel::mclapply
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.