README.md
In DelayedMatrixStats: Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

DelayedMatrixStats

DelayedMatrixStats is a port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package.

For a DelayedMatrix, x, the simplest way to apply a function, f(), from matrixStats ismatrixStats::f(as.matrix(x)). However, this “realizes” x in memory as a base::matrix, which typically defeats the entire purpose of using a DelayedMatrix for storing the data.

The DelayedArray package already implements a clever strategy called “block-processing” for certain common “matrix stats” operations (e.g. colSums(), rowSums()). This is a good start, but not all of the matrixStats API is currently supported. Furthermore, certain operations can be optimized with additional information about x. I’ll refer to these “seed-aware” implementations.

You can install DelayedMatrixStats from Bioconductor with:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("DelayedMatrixStats")

This example compares two ways of computing column sums of a DelayedMatrix object:

DelayedMatrix::colSums(): The ‘block-processing strategy’, implemented in the DelayedArray package. The block-processing strategy works for any DelayedMatrix object, regardless of the type of seed.
DelayedMatrixStats::colSums2(): The ‘seed-aware’ strategy, implemented in the DelayedMatrixStats package. The seed-aware implementation is optimized for both speed and memory but only for DelayedMatrix objects with certain types of seed.

#> Warning: package 'sparseMatrixStats' was built under R version 4.0.3

library(DelayedMatrixStats)
library(sparseMatrixStats)
library(microbenchmark)
library(profmem)

set.seed(666)

# Fast column sums of DelayedMatrix with matrix seed
dense_matrix <- DelayedArray(matrix(runif(20000 * 600), nrow = 20000,
                                    ncol = 600))
class(seed(dense_matrix))
#> [1] "matrix" "array"
dense_matrix
#> <20000 x 600> matrix of class DelayedMatrix and type "double":
#>                [,1]       [,2]       [,3] ...     [,599]     [,600]
#>     [1,]  0.7743685  0.6601787  0.4098798   . 0.89118118 0.05776471
#>     [2,]  0.1972242  0.8436035  0.9198450   . 0.31799523 0.63099417
#>     [3,]  0.9780138  0.2017589  0.4696158   . 0.31783791 0.02830454
#>     [4,]  0.2013274  0.8797239  0.6474768   . 0.55217184 0.09678816
#>     [5,]  0.3612444  0.8158778  0.5928599   . 0.08530977 0.39224147
#>      ...          .          .          .   .          .          .
#> [19996,] 0.19490291 0.07763570 0.56391725   . 0.09703424 0.62659353
#> [19997,] 0.61182993 0.01910121 0.04046034   . 0.59708388 0.88389731
#> [19998,] 0.12932744 0.21155070 0.19344085   . 0.51682032 0.13378223
#> [19999,] 0.18985573 0.41716539 0.35110782   . 0.62939661 0.94601427
#> [20000,] 0.87889047 0.25308041 0.54666920   . 0.81630322 0.73272217
microbenchmark(DelayedArray::colSums(dense_matrix),
               DelayedMatrixStats::colSums2(dense_matrix),
               times = 10)
#> Unit: milliseconds
#>                                        expr      min       lq      mean
#>         DelayedArray::colSums(dense_matrix) 58.10028 69.00577 143.70237
#>  DelayedMatrixStats::colSums2(dense_matrix) 13.60972 15.11668  19.23493
#>    median        uq       max neval cld
#>  81.08308 100.81742 414.00737    10   b
#>  16.10312  25.42459  30.91184    10  a
profmem::total(profmem::profmem(DelayedArray::colSums(dense_matrix)))
#> [1] 96105416
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(dense_matrix)))
#> [1] 166120

# Fast, low-memory column sums of DelayedMatrix with sparse matrix seed
sparse_matrix <- seed(dense_matrix)
zero_idx <- sample(length(sparse_matrix), 0.6 * length(sparse_matrix))
sparse_matrix[zero_idx] <- 0
sparse_matrix <- DelayedArray(Matrix::Matrix(sparse_matrix, sparse = TRUE))
class(seed(sparse_matrix))
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
sparse_matrix
#> <20000 x 600> sparse matrix of class DelayedMatrix and type "double":
#>               [,1]      [,2]      [,3] ...     [,599]     [,600]
#>     [1,] 0.7743685 0.0000000 0.0000000   . 0.89118118 0.00000000
#>     [2,] 0.1972242 0.0000000 0.9198450   . 0.00000000 0.00000000
#>     [3,] 0.9780138 0.0000000 0.4696158   . 0.31783791 0.00000000
#>     [4,] 0.0000000 0.8797239 0.6474768   . 0.55217184 0.00000000
#>     [5,] 0.3612444 0.0000000 0.0000000   . 0.08530977 0.39224147
#>      ...         .         .         .   .          .          .
#> [19996,] 0.1949029 0.0776357 0.0000000   . 0.09703424 0.00000000
#> [19997,] 0.0000000 0.0000000 0.0000000   . 0.00000000 0.88389731
#> [19998,] 0.0000000 0.2115507 0.1934408   . 0.00000000 0.00000000
#> [19999,] 0.1898557 0.0000000 0.3511078   . 0.62939661 0.94601427
#> [20000,] 0.8788905 0.2530804 0.0000000   . 0.00000000 0.73272217
microbenchmark(DelayedArray::colSums(sparse_matrix),
               DelayedMatrixStats::colSums2(sparse_matrix),
               times = 10)
#> Unit: milliseconds
#>                                         expr       min        lq      mean
#>         DelayedArray::colSums(sparse_matrix) 250.41540 297.64721 504.14460
#>  DelayedMatrixStats::colSums2(sparse_matrix)  15.45005  15.52633  17.30676
#>     median        uq       max neval cld
#>  489.64900 639.84463 945.35155    10   b
#>   16.13531  16.41159  26.49614    10  a
profmem::total(profmem::profmem(DelayedArray::colSums(sparse_matrix)))
#> [1] 249647176
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(sparse_matrix)))
#> [1] 7400

# Fast column sums of DelayedMatrix with Rle-based seed
rle_matrix <- RleArray(Rle(sample(2L, 200000 * 6 / 10, replace = TRUE), 100),
                       dim = c(2000000, 6))
class(seed(rle_matrix))
#> [1] "SolidRleArraySeed"
#> attr(,"package")
#> [1] "DelayedArray"
rle_matrix
#> <2000000 x 6> matrix of class RleMatrix and type "integer":
#>            [,1] [,2] [,3] [,4] [,5] [,6]
#>       [1,]    2    2    1    1    1    2
#>       [2,]    2    2    1    1    1    2
#>       [3,]    2    2    1    1    1    2
#>       [4,]    2    2    1    1    1    2
#>       [5,]    2    2    1    1    1    2
#>        ...    .    .    .    .    .    .
#> [1999996,]    1    2    2    1    1    1
#> [1999997,]    1    2    2    1    1    1
#> [1999998,]    1    2    2    1    1    1
#> [1999999,]    1    2    2    1    1    1
#> [2000000,]    1    2    2    1    1    1
microbenchmark(DelayedArray::colSums(rle_matrix),
               DelayedMatrixStats::colSums2(rle_matrix),
               times = 10)
#> Unit: milliseconds
#>                                      expr        min         lq      mean
#>         DelayedArray::colSums(rle_matrix) 636.779156 659.794020 700.62470
#>  DelayedMatrixStats::colSums2(rle_matrix)   4.482569   5.028546  14.81272
#>      median         uq       max neval cld
#>  677.413206 699.000029 842.93055    10   b
#>    5.689509   6.471868  84.09537    10  a
profmem::total(profmem::profmem(DelayedArray::colSums(rle_matrix)))
#> [1] 168002536
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(rle_matrix)))
#> [1] 1640

An extensive set of benchmarks is under development at http://peterhickey.org/BenchmarkingDelayedMatrixStats/.

✔ = Implemented in DelayedMatrixStats
☑️ = Implemented in DelayedArray or sparseMatrixStats
❌: = Not yet implemented

| Method | :--------------------- | colAlls() | colAnyMissings() | colAnyNAs() | colAnys() | colAvgsPerRowSet() | colCollapse() | colCounts() | colCummaxs() | colCummins() | colCumprods() | colCumsums() | colDiffs() | colIQRDiffs() | colIQRs() | colLogSumExps() | colMadDiffs() | colMads() | colMaxs() | colMeans2() | colMedians() | colMins() | colOrderStats() | colProds() | colQuantiles() | colRanges() | colRanks() | colSdDiffs() | colSds() | colsum() | colSums2() | colTabulates() | colVarDiffs() | colVars() | colWeightedMads() | colWeightedMeans() | colWeightedMedians() | colWeightedSds() | colWeightedVars() | rowAlls() | rowAnyMissings() | rowAnyNAs() | rowAnys() | rowAvgsPerColSet() | rowCollapse() | rowCounts() | rowCummaxs() | rowCummins() | rowCumprods() | rowCumsums() | rowDiffs() | rowIQRDiffs() | rowIQRs() | rowLogSumExps() | rowMadDiffs() | rowMads() | rowMaxs() | rowMeans2() | rowMedians() | rowMins() | rowOrderStats() | rowProds() | rowQuantiles() | rowRanges() | rowRanks() | rowSdDiffs() | rowSds() | rowsum() | rowSums2() | rowTabulates() | rowVarDiffs() | rowVars() | rowWeightedMads() | rowWeightedMeans() | rowWeightedMedians() | rowWeightedSds() | rowWeightedVars() | Block processing | base::matrix optimized | Matrix::dgCMatrix optimized | Matrix::lgCMatrix optimized | DelayedArray::RleArray (SolidRleArraySeed) optimized | DelayedArray::RleArray (ChunkedRleArraySeed) optimized | HDF5Array::HDF5Matrix optimized | base::data.frame optimized | S4Vectors::DataFrame optimized | | :--------------- | :----------------------- | :---------------------------- | :---------------------------- | :------------------------------------------------------- | :--------------------------------------------------------- | :-------------------------------- | :--------------------------- | :------------------------------- | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ☑️ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ☑️ | ❌ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ | | ✔ | ✔ | ✔ | ✔ | ❌ | ❌ | ❌ | ❌ | ❌ |

Any scripts or data that you put into this service are public.

DelayedMatrixStats documentation built on Feb. 5, 2021, 2:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DelayedMatrixStats
Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

README.md
In DelayedMatrixStats: Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

DelayedMatrixStats

Installation

Example

Benchmarking

API coverage

Try the DelayedMatrixStats package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

DelayedMatrixStats Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

README.md In DelayedMatrixStats: Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

DelayedMatrixStats

Installation

Example

Benchmarking

API coverage

Try the DelayedMatrixStats package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

DelayedMatrixStats
Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

README.md
In DelayedMatrixStats: Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects