troop: group by - apply - multiprocess data.table

View source: R/troop.R

troopR Documentation

group by - apply - multiprocess data.table

Description

group by - apply - multiprocess data.table

Usage

troop(data, by, apply_func, preprocess_func = function() { },
  postprocess_func = function() { }, num_chunks = detectCores(logical =
  TRUE), preprocess_args = list(), postprocess_args = list(),
  packages = c(), export = c(), combine = "c", files_to_source = c())

Arguments

data

input data of type data.table

by

character vector giving columns to group by

apply_func

function to be run in parallel

preprocess_func

function that will be run before apply_func. useful to open file/db handles

postprocess_func

function that will be run after apply_func. useful to close file/db handles

num_chunks

number of chunks to divide the data into. defaults to number of logical cores available

preprocess_args

a list of args to be passed to preprocess_func

postprocess_args

a list of args to be passed to postprocess_func

packages

character vector of package names to be exported on each core. NOTE: each package used by apply_func should be included

export

character vector of variable names to be exported on each core. NOTE: each variable name to be accessed inside apply_func should be exported

combine

the way results should be combined. accepts: c, +, rbind. defaults to c (character vector)

files_to_source

character vector of file names to be sourced on each core. the userr should have permission to read the file

Value

result of apply_func after combining results from each core using combine parameter above

See Also

http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf
http://r.adu.org.za/web/packages/foreach/vignettes/foreach.pdf
https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf
http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf
https://cran.r-project.org/web/packages/iterators/vignettes/writing.pdf

Examples

dt <- data.table(fread('sample.csv'))
v <- 10
foo <- function(data_chunk){
  # some complex operations
  nrow(data_chunk)
}
troop(dt, by = c('column1','column2'), apply_func = foo)
troop(dt, by = c('column1','column2'), apply_func = foo, files_to_source = c('somefile.R','anotherfile.R'))
troop(dt, by = c('column1','column2'), apply_func = foo, num_chunks = 10, packages = c('RODBC','xgboost'), export = c('v'), combine = 'c')

tejaslodaya/troop documentation built on March 6, 2023, 11:44 p.m.