Description Usage Arguments Details Examples
Ideally, a function that returns a data.frame should be supplied. This gives the user the advantage of specifying the names of the columns in the resulting data.frame. If the function does not return a data.frame, then column names will be automatically generated.
1 2 3 |
X |
List of objects to apply over |
FUN. |
Function to apply; allows for compact anonymous functions (see ?purrr::as_function) for details |
fill |
(defaults to TRUE) use plyr::rbind.fill to fill in missing columns when rbinding together results |
.id |
controls add identification of the output object based on the input object; see details |
output |
Output type. Defaults to 'data.frame', but can also be set to 'list' to suppress rbinding of the list. |
pb |
logical; use progress bar? |
parallel |
logical; use parallel processing? |
cache |
(defaults to FALSE) cache the results locally in a folder called "cache" using the memoise package |
error.na |
(defaults to TRUE) use purrr::possibly to replace errors with NA instead of interrupting the process |
num.cores |
The number of cores used for parallel processing. Can be specified as an integer, or it will guess the number of cores available with detectCores(). If parallel is FALSE, the input here will be set to 1. |
... |
Additional arguments to the function |
Use .id
to control the designation of which input generate which
output. Set to NULL
to suppress naming. By default, output lists will
be named and output data.frame will have an added column named id
. The
name of this inserted column can be changed by specifying a character string.
Alternatively, a vector of character strings can be used to manually identify
the output (called id
if in a data.frame). Names will be autogenerated
even if the input object has incomplete names or no names at all. Note that
this also works with functions that return a data.frame with more than one
row.
Parallel processing is carried out by pbapply::mclapply
. Use the
parallel
option to switch parallel processing on or off. Only specify
the number of cores when really needed as the function will detect the
maximum number of available cores. This makes it easy to rerun the script
with a higher number of available cores without having to change the code.
A progress bar can be shown in the terminal using an interactive R session or in an .Rout file, if using R CMD BATCH and submitting R scripts for non-interactive completion. Although R Studio supports the progress bar for single process workers, it has a problem showing the progress bar if using parallel processing (see the discussion at http://stackoverflow.com/questions/27314011/mcfork-in-rstudio). In this specific case (R Studio + parallel processing), text updates will be printed to the file '.process'. Use a shell and 'tail -f .progress' to see the updates.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ## Not run:
X <- as.data.frame(matrix(runif(100),ncol=10))
fun. <- function(x) {
Sys.sleep(0.5)
mean(x)
}
cb_apply(X,fun.,cache=TRUE)
fun. <- function(x) {
Sys.sleep(0.5)
data.frame('mean'=mean(x),'median'=median(x))
}
cb_apply(X,fun.)
# when setting names of input object, function will attempt to assign them to
# the output in a new column
names(X) <- LETTERS[1:10]
cb_apply(X,fun.,output='list')
cb_apply(X,fun.)
# name the id columns something else
cb_apply(X,fun.,.id='group')
# specify a new identifier manually
cb_apply(X,fun.,.id=LETTERS[11:20])
# set .id to NULL to supress the addition of the id columns
cb_apply(X,fun.,.id=NULL)
# naming still works even if the function returns a data.frame with two rows
fun. <- function(x) {
Sys.sleep(0.5)
data.frame('stat'=c(mean(x),median(x)))
}
cb_apply(X,fun.)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.