Description Usage Arguments Details Value Author(s) See Also Examples
Parses a large list into subsets and submits a separate batch R job that calls lapply
on the subset. plapply
has some features that may not be readily available in
other parallelization functions like mclapply
and parLapply
:
The .Rout
files produced by each R instance are easily accessible
for convenient debugging of errors or warnings. The .Rout
files
can also serve as an explicit record of the work that
was performed by the workers
Three options are available for the ordering of the processing of the list elements: the original list order, randomized, or collated (first-in-first-out).
In each R instance, pre-processing or post-processing steps can be performed
before and after the call to lapply
These pre-processing and post-processing steps can depend on the instance of R, such that each instance can be treated differently, if desired. These features give greater control over the computing process, which can be especially useful for large jobs.
1 2 3 4 5 6 | plapply(X, FUN, ..., njobs = parallel::detectCores() - 1, packages = NULL,
header.file = NULL, needed.objects = NULL,
needed.objects.env = parent.frame(), workDir = "plapply",
clobber = TRUE, max.hours = 24, check.interval.sec = 1,
collate = FALSE, random.seed = NULL, rout = NULL, clean.up = TRUE,
verbose = FALSE)
|
X |
A list or vector, each element of which will be the input to |
FUN |
A function whose first argument is an element of |
... |
Additional named arguments to |
njobs |
The number of jobs (subsets). Defaults to one less than the number of cores on the machine. |
packages |
Character vector giving the names of packages that will be
loaded in each new instance of R, using |
header.file |
Text string indicating a file that will be initially
sourced prior calling |
needed.objects |
Character vector giving the names of objects which
reside in the evironment specified by |
needed.objects.env |
Environment where |
workDir |
Character string giving the name of the working directory that will be used for for the files needed to launch the separate instances of R. |
clobber |
Logical indicating whether the directory designated by |
max.hours |
The maximum number of hours to wait for the |
check.interval.sec |
The number of seconds to wait between checking to
see whether all |
collate |
|
random.seed |
An integer setting the random seed, which will result in
randomizing the elements of the list assigned to each job. This is useful
when the computing time for each element varies significantly because it
helps to even out the run times of the parallel jobs. If |
rout |
A character string giving the name of the file to where all of the |
clean.up |
|
verbose |
|
plapply
applies FUN
to each element of the list X
by
parsing the list into njobs
lists of equal (or almost equal) size
and then applies FUN
to each sublist using lapply
.
A separate batch instance of R is launched for each sublist, thus utilizing
another core of the machine. After the jobs complete, the njobs
output lists are reassembled. The global environments for each batch instance
of R are created by writing/reading data to/from disc.
If collate = TRUE
or random.seed = Integer value
, the output
list returned by plapply
is reordered to reflect the original
ordering of the input list, X
.
An object called process.id
(consisting of an integer indicating the
process number) is available in the global environment of each instance of
R.
Each instance of R runs a script that performs the following steps:
Any other packages indicated in the packages
argument are
loaded via calls to library()
The process.id
global variable is assigned to the global
environment of the R instance (having been passed
in via a command line argument)
The header file (if there is one) is sourced
The expression pre.process.expression
is evaluated if an
object of that name is present in the global environment. The object
pre.process.expression
may be passed in via the header file or via
needed.objects
lapply
is called on the sublist, the sublist is called
X.i
The expression post.process.expression
is evaluated if an
object of that name is present in the global environment. The object
post.process.expression
may be passed in via the header file or via
needed.objects
The output returned by lapply
is assigned to the object
X.i.out
, and is saved to a temporary file
where it will be collected after all jobs have completed
Warnings are printed
If njobs = 1
, none of the previous steps are executed, only this
call is made: lapply(X, FUN, ...)
A list equivalent to that returned by lapply(X, FUN, ...)
.
Landon Sego
parLapplyW
, dfplapply
, parLapply
, mclapply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # Create a simple list
a <- list(a = rnorm(10), b = rnorm(20), c = rnorm(15), d = rnorm(13),
e = rnorm(15), f = rnorm(22))
# Some objects that will be needed by f1:
b1 <- rexp(20)
b2 <- rpois(10, 20)
# The function
f1 <- function(x) mean(x) + max(b1) - min(b2)
# Call plapply
res1 <- plapply(a, f1, njobs = 2, needed.objects = c("b1", "b2"),
check.interval.sec = 0.5, max.hours = 1/120,
workDir = "example1", rout = "example1.Rout",
clean.up = FALSE)
print(res1)
# Look at the collated 'Rout' file
more("example1.Rout")
# Look at the contents of the working directory
dir("example1")
# Remove working directory and Rout file
unlink("example1", recursive = TRUE, force = TRUE)
unlink("example1.Rout")
# Verify the result with lapply
res2 <- lapply(a, f1)
# Compare results
identical(res1, res2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.