aggregate-data.table-method: Functional S4 API for aggregation over a 'data.table' object.

aggregate,data.table-methodR Documentation

Functional S4 API for aggregation over a data.table object.

Description

Compute a group-by operation over a data.table in a functional, pipe compatible format.

Usage

## S4 method for signature 'data.table'
aggregate(
  x,
  by,
  ...,
  subset = TRUE,
  nthread = 1,
  progress = TRUE,
  BPPARAM = NULL,
  enlist = TRUE,
  moreArgs = list()
)

Arguments

x

data.table to compute aggregation over.

by

character One or more valid column names in x to compute groups using.

...

call One or more aggregations to compute for each group by in x. If you name aggregation calls, that will be the column name of the value in the resulting data.table otherwise a default name will be parsed from the function name and its first argument, which is assumed to be the name of the column being aggregated over.

subset

call An R call to evaluate before perfoming an aggregate. This allows you to aggregate over a subset of columns in an assay but have it be assigned to the parent object. Default is TRUE, which includes all rows. Passed through as the i argument in ⁠[.data.table⁠.

nthread

numeric(1) Number of threads to use for split-apply-combine parallelization. Uses BiocParllel::bplapply if nthread > 1 or you pass in BPPARAM. Does not modify data.table threads, so be sure to use setDTthreads for reasonable nested parallelism. See details for performance considerations.

progress

logical(1) Display a progress bar for parallelized computations? Only works if ⁠bpprogressbar<-⁠ is defined for the current BiocParallel back-end.

BPPARAM

BiocParallelParam object. Use to customized the the parallization back-end of bplapply. Note, nthread over-rides any settings from BPPARAM as long as ⁠bpworkers<-⁠ is defined for that class.

enlist

logical(1) Default is TRUE. Set to FALSE to evaluate the first call in ... within data.table groups. See details for more information.

moreArgs

list() A named list where each item is an argument one of the calls in ... which is not a column in the table being aggregated. Use to further parameterize you calls. Please note that these are not added to your aggregate calls unless you specify the names in the call.

Details

This S4 method override the default aggregate method for a data.frame and as such you need to call aggregate.data.frame directly to get the original S3 method for a data.table.

Use of Non-Standard Evaluation

Arguments in ... are substituted and wrapped in a list, which is passed through to the j argument of ⁠[.data.table⁠ internally. The function currently tries to build informative column names for unnamed arguments in ... by appending the name of each function call with the name of its first argument, which is assumed to be the column name being aggregated over. If an argument to ... is named, that will be the column name of its value in the resulting data.table.

Enlisting

The primary use case for enlist=FALSE is to allow computation of dependent aggregations, where the output from a previous aggregation is required in a subsequent one. For this case, wrap your call in ⁠{⁠ and assign intermediate results to variables, returning the final results as a list where each list item will become a column in the final table with the corresponding name. Name inference is disabled for this case, since it is assumed you will name the returned list items appropriately. A major advantage over multiple calls to aggregate is that the overhead of parallelization is paid only once even for complex multi-step computations like fitting a model, capturing its paramters, and making predictions using it. It also allows capturing arbitrarily complex calls which can be recomputed later using the ⁠update,TreatmentResponseExperiment-method⁠ A potential disadvantage is increased RAM usage per thread due to storing intermediate values in variables, as well as any memory allocation overhead associate therewith.

Value

data.table of aggregated results with an aggregations attribute capturing metadata about the last aggregation performed on the table.


bhklab/CoreGx documentation built on March 14, 2024, 3:04 a.m.