troop | R Documentation |
group by - apply - multiprocess data.table
troop(data, by, apply_func, preprocess_func = function() { }, postprocess_func = function() { }, num_chunks = detectCores(logical = TRUE), preprocess_args = list(), postprocess_args = list(), packages = c(), export = c(), combine = "c", files_to_source = c())
data |
input data of type data.table |
by |
character vector giving columns to group by |
apply_func |
function to be run in parallel |
preprocess_func |
function that will be run before apply_func. useful to open file/db handles |
postprocess_func |
function that will be run after apply_func. useful to close file/db handles |
num_chunks |
number of chunks to divide the data into. defaults to number of logical cores available |
preprocess_args |
a list of args to be passed to preprocess_func |
postprocess_args |
a list of args to be passed to postprocess_func |
packages |
character vector of package names to be exported on each core. NOTE: each package used by apply_func should be included |
export |
character vector of variable names to be exported on each core. NOTE: each variable name to be accessed inside apply_func should be exported |
combine |
the way results should be combined. accepts: c, +, rbind. defaults to c (character vector) |
files_to_source |
character vector of file names to be sourced on each core. the userr should have permission to read the file |
result of apply_func
after combining results from each core using combine parameter above
http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf
http://r.adu.org.za/web/packages/foreach/vignettes/foreach.pdf
https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf
http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf
https://cran.r-project.org/web/packages/iterators/vignettes/writing.pdf
dt <- data.table(fread('sample.csv')) v <- 10 foo <- function(data_chunk){ # some complex operations nrow(data_chunk) } troop(dt, by = c('column1','column2'), apply_func = foo) troop(dt, by = c('column1','column2'), apply_func = foo, files_to_source = c('somefile.R','anotherfile.R')) troop(dt, by = c('column1','column2'), apply_func = foo, num_chunks = 10, packages = c('RODBC','xgboost'), export = c('v'), combine = 'c')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.