In ekernf01/MatrixLazyEval: Lazy evaluation for cheap storage and fast operations with sparse matrices

An introduction to the `LazyMatrix` package

Sometimes it doesn't make sense to store your matrices explicitly. LazyMatrix allows you to store them in clever ways that save memory and allow for fast matrix-vector multiplication. In turn, this facilitates speedups for algorithms that only require matrix-vector products -- for example, Facebook's randomized PCA.

In the following examples, you start with a large, sparse matrix $M$, and you want to perform simple operations without causing it to become dense. This vignette will compare the performance of LazyMatrix against a dense-matrix approach using only the Matrix package.

Setup

For the demo data, you'll need the R package MAST. This chunk then creates a large, sparse matrix.

library(MAST)
library(Matrix)
library(MatrixLazyEval)
if(require("microbenchmark")){ 
  mb = microbenchmark::microbenchmark 
} else {
  mb = function( expr, ... ) try( expr )
}
data(CAex)
#M = CAex
M = t(MAST::maits$expressionmat) 
M = M * matrix(rbinom(prod(dim(M)), p = 0.01, 1), nrow(M), ncol(M))
M = Matrix::Matrix(M, sparse = T )
M = cbind(M, M, M, M, M, M, M, M, M, M)
dim(M)
class(M)
cat("NNZ: ", sum( M > 0 ), "\n")
cat("Proportion NZ: ", sum( M > 0 ) / prod( dim(M) ), "\n" )
cat("Dimensions: ", dim( M ), "\n" )

Teaching Example

Suppose you wish to shift the rows of $M$ to have mean 0. One way to accomplish this is to subtract a matrix $RE$ where $R\in \mathbb{R^{m\times 1}}$ contains the row-means in a column and $E\in \mathbb{R^{1\times n}}$ contains just 1's. Using the LEFT and RIGHT keywords inside the evaluation rule, you can enforce that evaluation is done efficiently, without explicitly forming the dense and painfully redundant matrix $RE$. In other words, always do $xMy - (xR)(Ey)$ rather than $x(M - RE)y$.

Here's how the code looks.

R    = matrix( nrow = nrow(M), ncol = 1, data = Matrix::rowMeans( M ) )
ONES = matrix( ncol = ncol(M), nrow = 1, data = 1 )
M_shifted = M - R %*% ONES
M_shifted_lazy = NewLazyMatrix( components = list("M" = M, "R" = R, "ONES" = ONES ), 
                                dim = dim(M),
                                eval_rule  = "(LEFT %*% M %*% RIGHT) - (LEFT %*% R) %*% (ONES %*% RIGHT)" )
object.size(M_shifted)
object.size(M_shifted_lazy)
x = rnorm( ncol( M ) )
mb({M_shifted      %*% x})
mb({M_shifted_lazy %*% x})
cat( "Results are identical to within: ", max(abs((M_shifted      %*% x)  - (M_shifted_lazy %*% x))) )

Teaching Example

Suppose you wish to replace $M$ with residuals from a regression with a few covariates $X$, which means you need to compute $Z \equiv M - X(X^TX)^{-1}X^TM$. If $X$ has relatively few columns, it's much faster to do $wMy - (wX(X^TX)^{-1})(X^TMy)$ than $M - (X(X^TX)^{-1}X^T) M$.

Here's how you could do this from scratch. (But, check out the next example too, because you don't have to do it from scratch every time.)

X = as.matrix( data.frame( intercept = 1, slope = 1:nrow(M) ) )
XtXinv = solve(t(X) %*% X) # this is just 2 by 2
M_regressed = M - X %*% XtXinv %*% t(X) %*% M
M_regressed_lazy = NewLazyMatrix( components = list("M" = M, "X" = X, "XtXinv" = XtXinv ), 
                                  dim = dim(M),
                                  eval_rule  = "( LEFT %*% M %*% RIGHT ) - ( LEFT %*% X %*% XtXinv) %*% ( t(X) %*% M %*% RIGHT ) " )

object.size(M_regressed)
object.size(M_regressed_lazy)
x = rnorm( ncol( M ) )
mb({M_regressed      %*% x})
mb({M_regressed_lazy %*% x})
cat( "Results are identical to within: ", max(abs((M_regressed      %*% x)  - (M_regressed_lazy %*% x))) )

N.B. Statistics nerds may notice that this example generalizes the previous example -- subtracting the row-wise mean is equivalent to regressing out a constant. This is envisioned to be an especially common use case for MatrixLazyEval ...

Real Usage Example

Regressing out covariates is common enough to be implemented in the function RegressOutLazily.

X = as.matrix( data.frame( intercept = 1, slope = 1:nrow(M) ) )
XtXinv = solve(t(X) %*% X) # this is just 2 by 2
M_regressed = M - X %*% XtXinv %*% t(X) %*% M
M_regressed_lazy = RegressOutLazily( M, X )
x = rnorm( ncol( M ) )
mb({M_regressed      %*% x})
mb({M_regressed_lazy %*% x})
cat( "Results are identical to within: ", max(abs((M_regressed      %*% x)  - (M_regressed_lazy %*% x))) )

You can even compute a randomized SVD on the resulting LazyMatrix. It's not exactly the same, but it's close.

mb({irlba::irlba(M_regressed, 5)}, times = 10)
mb({RandomSVDLazyMatrix(M_regressed_lazy, 5)}, times = 10)
svd_regular = irlba::irlba(M_regressed, 5)
svd_lazy    = RandomSVDLazyMatrix(M_regressed_lazy, 5)
plot( svd_regular$u[, 1], svd_lazy$u[, 1] )

ekernf01/MatrixLazyEval documentation built on July 24, 2022, 2:42 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ekernf01/MatrixLazyEval
Lazy evaluation for cheap storage and fast operations with sparse matrices

In ekernf01/MatrixLazyEval: Lazy evaluation for cheap storage and fast operations with sparse matrices

An introduction to the `LazyMatrix` package

Setup

Teaching Example

Teaching Example

Real Usage Example

R Package Documentation

Browse R Packages

We want your feedback!

ekernf01/MatrixLazyEval Lazy evaluation for cheap storage and fast operations with sparse matrices

In ekernf01/MatrixLazyEval: Lazy evaluation for cheap storage and fast operations with sparse matrices

An introduction to the LazyMatrix package

Setup

Teaching Example

Teaching Example

Real Usage Example

R Package Documentation

Browse R Packages

We want your feedback!

ekernf01/MatrixLazyEval
Lazy evaluation for cheap storage and fast operations with sparse matrices

An introduction to the `LazyMatrix` package