Description Pivotal functions Low-level functions Classes Options Warning Author(s) See Also
multicore is an R package that provides functions for parallel execution of R code on machines with multiple cores or CPUs. Unlike other parallel processing methods all jobs share the full state of R when spawned, so no data or code needs to be initialized. The actual spawning is very fast as well since no new R instance needs to be started.
mclapply
- parallelized version of lapply
pvec
- parallelization of vectorized functions
parallel
and collect
- functions to
evaluate R expressions in parallel and collect the results.
Those function should be used only by experienced users understanding the interaction of the master (parent) process and the child processes (jobs) as well as the system-level mechanics involved.
See fork
help page for the principles of forking
parallel processes and system-level functions, children
and sendMaster
help pages for management and
communication between the parent and child processes.
multicore defines a few informal (S3) classes:
process
is a list with a named entry pid
containing the
process ID.
childProcess
is a subclass of process
representing a
child process of the current R process. A child process is a special
process that can send messages to the parent process. The list may
contain additional entries for IPC (more precisely file descriptors),
however those are considered internal.
masterProcess
is a subclass of process
representing a
handle that is passed to a child process by fork
.
parallelJob
is a subclass of childProcess
representing a
child process created using the parallel
function. It
may (optionally) contain a name
entry – a character vector
of the length one as the name of the job.
By default functions that spawn jobs across cores use the
"cores"
option (see options
) to determine how
many cores (or CPUs) will be used (unless specified directly). If this
option is not set, multicore uses by default as many cores as
there are available. (Note: cores in this document refer to
virtual cores. Modern CPUs can have more virutal cores than physical
cores to accommodate simultaneous multithreading. For example, a machine
with two quad-core Xeon W5590 processors has combined eight physical
cores but 16 virtual cores. Also note that it is often beneficial to
schedule more tasks than cores.)
The number of available cores is determined on startup using the
(non-exported) detectCores()
function. It should work on most
commonly used unix systems (Mac OS X, Linux, Solaris and IRIX), but
there is no standard way of determining the number of cores, so
please contact me (with sessionInfo()
output and the test) if
you have tests for other platforms. If in doubt, use
multicore:::detectCores(all.tests=TRUE)
to see whether your
platform is covered by one of the already existing tests. If multicore
cannot determine the number of cores (the above returns NA
), it
will default to 8 (which should be fine for most modern desktop
systems).
multicore uses the fork
system call to spawn a copy of
the current process which performs the compultations in
parallel. Modern operating systems use copy-on-write approach which
makes this so appealing for parallel computation since only objects
modified during the computation will be actually copied and all other
memory is directly shared.
However, the copy shares everything including any user interface elements. This can cause havoc since let's say one window now suddenly belongs to two processes. Therefore multicore should be preferrably used in console R and code executed in parallel may never use GUIs or on-screen devices.
An (experimental) way to avoid some such problems in some GUI
environments (those using pipes or sockets) is to use
multicore:::closeAll()
in each child process immediately after
it is spawned.
Simon Urbanek
parallel
, mclapply
,
fork
, sendMaster
, children
and signals
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.