# colStats: Row and Column Summary Statistics In matter: A framework for rapid prototyping with file-based data structures

## Description

These functions perform calculation of summary statistics over matrix rows and columns, for each level of a grouping variable (optionally), and with implicit row/column scaling and centering if desired.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```## S4 method for signature 'ANY' colStats(x, stat, groups, na.rm = FALSE, tform = identity, col.center = NULL, col.scale = NULL, row.center = NULL, row.scale = NULL, drop = TRUE, BPPARAM = bpparam(), ...) ## S4 method for signature 'ANY' rowStats(x, stat, groups, na.rm = FALSE, tform = identity, col.center = NULL, col.scale = NULL, row.center = NULL, row.scale = NULL, drop = TRUE, BPPARAM = bpparam(), ...) ```

## Arguments

 `x` A matrix on which to calculate summary statistics. `stat` The name of summary statistics to compute over the rows or columns of a matrix. Allowable values include: "min", "max", "prod", "sum", "mean", "var", "sd", "any", "all", and "nnzero". `groups` A factor or vector giving the grouping. If not provided, no grouping will be used. `na.rm` If `TRUE`, remove `NA` values before summarizing. `tform` A dimensionality-preserving transformation to be applied to the matrix (e.g., `log()` or `sqrt()`). `col.center` A vector of column centers to substract from each row. (Or a matrix with a column for each level of `groups`.) `col.scale` A vector of column scaling factors to divide from each row. (Or a matrix with a column for each level of `groups`.) `row.center` A vector of row centers to substract from each column. (Or a matrix with a column for each level of `groups`.) `row.scale` A vector of row centers to scaling factors to divide from each column. (Or a matrix with a column for each level of `groups`.) `drop` If only a single summary statistic is calculated, return the results as a vector (or matrix) rather than a list. `BPPARAM` An optional instance of `BiocParallelParam`. See documentation for `bplapply`. `...` Additional arguments.

## Details

The summary statistics methods are calculated over chunks of the matrix using `colstreamStats` and `rowstreamStats`. For `matter` objects, the iteration is performed over the major dimension for IO efficiency.

## Value

A list for each `stat` requested, where each element is either a vector (if no grouping variable is provided) or a matrix where each column corresponds to a different level of `groups`.

If `drop=TRUE`, and only a single statistic is requested, then the result will be unlisted and returned as a vector or matrix.

## Author(s)

Kylie A. Bemis

`colSums`
 ```1 2 3 4 5 6 7 8 9``` ```register(SerialParam()) set.seed(1) x <- matrix(runif(100^2), nrow=100, ncol=100) groups <- as.factor(rep(letters[1:5], each=20)) colStats(x, "mean", groups=groups) ```