# summary-stats: Summary Statistics for "matter" Objects In matter: A framework for rapid prototyping with file-based data structures

## Description

These functions efficiently calculate summary statistics for `matter` objects. For matrices, they operate efficiently on both rows and columns.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36``` ```## S4 method for signature 'matter' range(x, na.rm) ## S4 method for signature 'matter' min(x, na.rm) ## S4 method for signature 'matter' max(x, na.rm) ## S4 method for signature 'matter' prod(x, na.rm) ## S4 method for signature 'matter' mean(x, na.rm) ## S4 method for signature 'matter' sum(x, na.rm) ## S4 method for signature 'matter' sd(x, na.rm) ## S4 method for signature 'matter' var(x, na.rm) ## S4 method for signature 'matter' any(x, na.rm) ## S4 method for signature 'matter' all(x, na.rm) ## S4 method for signature 'matter_mat' colMeans(x, na.rm) ## S4 method for signature 'matter_mat' colSums(x, na.rm) ## S4 method for signature 'matter_mat' colSds(x, na.rm) ## S4 method for signature 'matter_mat' colVars(x, na.rm) ## S4 method for signature 'matter_mat' rowMeans(x, na.rm) ## S4 method for signature 'matter_mat' rowSums(x, na.rm) ## S4 method for signature 'matter_mat' rowSds(x, na.rm) ## S4 method for signature 'matter_mat' rowVars(x, na.rm) ```

## Arguments

 `x` A `matter` object. `na.rm` If `TRUE`, remove `NA` values before summarizing.

## Details

These summary statistics methods operate on chunks of data (equal to the `chunksize` of `x`) which are loaded into memory and then freed before reading the next chunk.

For row and column summaries on matrices, the iteration scheme is dependent on the layout of the data. Column-major matrices will always be iterated over by column, and row-major matrices will always be iterated over by row. Row statistics on column-major matrices and column statistics on row-major matrices are calculated iteratively.

The efficiency of these methods is entirely dependent on the `chunksize` of `x`. Larger chunks will yield faster calculations, but greater memory usage. The row and column summary methods may be more or less efficient than the equivalent call to `apply`, depending on the chunk size.

Variance and standard deviation are calculated using a running sum of squares formula which can be calculated iteratively and is accurate for large floating-point datasets (see reference).

## Value

For `mean`, `sd`, and `var`, a single number. For the column summaries, a vector of length equal to the number of columns of the matrix. For the row summaries, a vector of length equal to the number of rows of the matrix.

Kylie A. Bemis

## References

B. P. Welford, “Note on a Method for Calculating Corrected Sums of Squares and Products,” Technometrics, vol. 4, no. 3, pp. 1-3, Aug. 1962.

`stream_stat`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```x <- matter(1:100, nrow=10, ncol=10) sum(x) mean(x) var(x) sd(x) colSums(x) colMeans(x) colVars(x) colSds(x) rowSums(x) rowMeans(x) rowVars(x) rowSds(x) ```