mcsplitapply: Parallel split-matrix or dataframe apply loop

Description Usage Arguments Value Examples

View source: R/my_functions.r

Description

Splits a matrix or data.frame into subsets based on a factor, and applies a function to each subset. Typical use case: sum exon count data to gene count data. This is similar to the dplyr idiom: df %>% group_by(f) %>% do(...), but has several advantages. 1) mcsplitapply can be used on matrices (and is therefore much faster), 2) inherently parallelized, 3) can return results other than dataframes, 4) you can specify how the data are combined (default is rbind).

Usage

1
mcsplitapply(mat, f, func, mc.cores = 4, .combine = rbind, ...)

Arguments

mat

The matrix.

f

A factor of length equal to nrow(mat). The levels of this factor will split the matrix into subsets.

func

The function to apply to each subset.

mc.cores

The number of cores to use.

.combine

The function to combine the results with. Default is rbind. Use NA to return a list.

Value

A list or a combined object depending on the .combine parameter.

Examples

1
2
3
4
5
6
7
library(pasilla)
library(DEXSeq)
library(trqwe)
data(pasillaDEXSeqDataSet)
exon_counts <- counts(dxd)
f <- rowData(dxd)$groupID
gene_counts <- mcsplitapply(exon_counts, f, colSums)

traversc/trqwe documentation built on Dec. 4, 2020, 4:21 a.m.