LazyMatrix
packageSometimes it doesn't make sense to store your matrices explicitly. LazyMatrix
allows you to store them in clever ways that save memory and allow for fast matrix-vector multiplication. In turn, this facilitates speedups for algorithms that only require matrix-vector products -- for example, Facebook's randomized PCA.
In the following examples, you start with a large, sparse matrix $M$, and you want to perform simple operations without causing it to become dense. This vignette will compare the performance of LazyMatrix
against a dense-matrix approach using only the Matrix
package.
For the demo data, you'll need the R package MAST
. This chunk then creates a large, sparse matrix.
library(MAST) library(Matrix) library(MatrixLazyEval) if(require("microbenchmark")){ mb = microbenchmark::microbenchmark } else { mb = function( expr, ... ) try( expr ) } data(CAex) #M = CAex M = t(MAST::maits$expressionmat) M = M * matrix(rbinom(prod(dim(M)), p = 0.01, 1), nrow(M), ncol(M)) M = Matrix::Matrix(M, sparse = T ) M = cbind(M, M, M, M, M, M, M, M, M, M) dim(M) class(M) cat("NNZ: ", sum( M > 0 ), "\n") cat("Proportion NZ: ", sum( M > 0 ) / prod( dim(M) ), "\n" ) cat("Dimensions: ", dim( M ), "\n" )
Suppose you wish to shift the rows of $M$ to have mean 0. One way to accomplish this is to subtract a matrix $RE$ where $R\in \mathbb{R^{m\times 1}}$ contains the row-means in a column and $E\in \mathbb{R^{1\times n}}$ contains just 1's. Using the LEFT
and RIGHT
keywords inside the evaluation rule, you can enforce that evaluation is done efficiently, without explicitly forming the dense and painfully redundant matrix $RE$. In other words, always do $xMy - (xR)(Ey)$ rather than $x(M - RE)y$.
Here's how the code looks.
R = matrix( nrow = nrow(M), ncol = 1, data = Matrix::rowMeans( M ) ) ONES = matrix( ncol = ncol(M), nrow = 1, data = 1 ) M_shifted = M - R %*% ONES M_shifted_lazy = NewLazyMatrix( components = list("M" = M, "R" = R, "ONES" = ONES ), dim = dim(M), eval_rule = "(LEFT %*% M %*% RIGHT) - (LEFT %*% R) %*% (ONES %*% RIGHT)" ) object.size(M_shifted) object.size(M_shifted_lazy) x = rnorm( ncol( M ) ) mb({M_shifted %*% x}) mb({M_shifted_lazy %*% x}) cat( "Results are identical to within: ", max(abs((M_shifted %*% x) - (M_shifted_lazy %*% x))) )
Suppose you wish to replace $M$ with residuals from a regression with a few covariates $X$, which means you need to compute $Z \equiv M - X(X^TX)^{-1}X^TM$. If $X$ has relatively few columns, it's much faster to do $wMy - (wX(X^TX)^{-1})(X^TMy)$ than $M - (X(X^TX)^{-1}X^T) M$.
Here's how you could do this from scratch. (But, check out the next example too, because you don't have to do it from scratch every time.)
X = as.matrix( data.frame( intercept = 1, slope = 1:nrow(M) ) ) XtXinv = solve(t(X) %*% X) # this is just 2 by 2 M_regressed = M - X %*% XtXinv %*% t(X) %*% M M_regressed_lazy = NewLazyMatrix( components = list("M" = M, "X" = X, "XtXinv" = XtXinv ), dim = dim(M), eval_rule = "( LEFT %*% M %*% RIGHT ) - ( LEFT %*% X %*% XtXinv) %*% ( t(X) %*% M %*% RIGHT ) " ) object.size(M_regressed) object.size(M_regressed_lazy) x = rnorm( ncol( M ) ) mb({M_regressed %*% x}) mb({M_regressed_lazy %*% x}) cat( "Results are identical to within: ", max(abs((M_regressed %*% x) - (M_regressed_lazy %*% x))) )
N.B. Statistics nerds may notice that this example generalizes the previous example -- subtracting the row-wise mean is equivalent to regressing out a constant. This is envisioned to be an especially common use case for MatrixLazyEval ...
Regressing out covariates is common enough to be implemented in the function RegressOutLazily
.
X = as.matrix( data.frame( intercept = 1, slope = 1:nrow(M) ) ) XtXinv = solve(t(X) %*% X) # this is just 2 by 2 M_regressed = M - X %*% XtXinv %*% t(X) %*% M M_regressed_lazy = RegressOutLazily( M, X ) x = rnorm( ncol( M ) ) mb({M_regressed %*% x}) mb({M_regressed_lazy %*% x}) cat( "Results are identical to within: ", max(abs((M_regressed %*% x) - (M_regressed_lazy %*% x))) )
You can even compute a randomized SVD on the resulting LazyMatrix. It's not exactly the same, but it's close.
mb({irlba::irlba(M_regressed, 5)}, times = 10) mb({RandomSVDLazyMatrix(M_regressed_lazy, 5)}, times = 10) svd_regular = irlba::irlba(M_regressed, 5) svd_lazy = RandomSVDLazyMatrix(M_regressed_lazy, 5) plot( svd_regular$u[, 1], svd_lazy$u[, 1] )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.