Description Usage Arguments Details Value See Also Examples
Parallel implementation of plyr::ddply
that suppresses a spurious warning when
plyr::ddply
is called in parallel.
All of the arguments except njobs
are passed directly to arguments of the same name in
plyr::ddply
.
1 2 3 |
.data |
data frame to be processed |
.variables |
character vector of variables in |
.fun |
function to apply to each piece |
... |
other arguments passed on to '.fun' |
njobs |
the number of parallel jobs to launch, defaulting to one less than the number of available cores on the machine |
.progress |
name of the progress bar to use, see |
.inform |
produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging |
.drop |
should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default) |
.paropts |
a list of additional options passed into the |
An innocuous warning is thrown when plyr::ddply
is called in parallel:
https://github.com/hadley/plyr/issues/203. This function catches and hides that warning, which looks like this:
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'
If njobs = 1
, a call to plyr::ddply
is made without parallelization, and anything
supplied to .paropts
is ignored. See the documentation for plyr::ddply
for additional details.
The object data frame returned by plyr::ddply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | data(baseball, package = "plyr")
# Summarize the number of entries for each year in the baseball dataset with 2 jobs
o1 <- pddply(baseball, ~ year, nrow, njobs = 2)
head(o1)
# Verify it's the same as the non-parallel version of plyr::ddply()
o2 <- plyr::ddply(baseball, ~ year, nrow)
identical(o1, o2)
# Another possibility
o3 <- pddply(baseball, "lg", c("nrow", "ncol"), njobs = 2)
o3
o4 <- plyr::ddply(baseball, "lg", c("nrow", "ncol"))
identical(o3, o4)
# A nonsense example where we need to pass objects and packages into the cluster
number1 <- 7
f <- function(x, number2 = 10) {
paste(x$id[1], padZero(number1, num = 2), number2, sep = "-")
}
# In parallel
o5 <- pddply(baseball[1:100,], "year", f, number2 = 13, njobs = 2,
.paropts = list(.packages = "Smisc", .export = "number1"))
o5
# Non parallel
o6 <- plyr::ddply(baseball[1:100,], "year", f, number2 = 13)
identical(o5, o6)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.