mcsplitapply: Parallel split-matrix or dataframe apply loop
In traversc/trqwe: Performance oriented statistical metrics and utility functions.

Description Usage Arguments Value Examples

Splits a matrix or data.frame into subsets based on a factor, and applies a function to each subset. Typical use case: sum exon count data to gene count data. This is similar to the dplyr idiom: df %>% group_by(f) %>% do(...), but has several advantages. 1) mcsplitapply can be used on matrices (and is therefore much faster), 2) inherently parallelized, 3) can return results other than dataframes, 4) you can specify how the data are combined (default is rbind).

1	mcsplitapply(mat, f, func, mc.cores = 4, .combine = rbind, ...)

`mat`	The matrix.
`f`	A factor of length equal to nrow(mat). The levels of this factor will split the matrix into subsets.
`func`	The function to apply to each subset.
`mc.cores`	The number of cores to use.
`.combine`	The function to combine the results with. Default is rbind. Use NA to return a list.

A list or a combined object depending on the .combine parameter.

library(pasilla)
library(DEXSeq)
library(trqwe)
data(pasillaDEXSeqDataSet)
exon_counts <- counts(dxd)
f <- rowData(dxd)$groupID
gene_counts <- mcsplitapply(exon_counts, f, colSums)