require(knitr) opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
r Biocpkg("BiocSingular") implements several useful
DelayedMatrix backends for dealing with principal components analysis (PCA).
This vignette aims to provide an overview of these classes and how they can be used in other packages to improve efficiency prior to or after PCA.
As previously discussed, we can defer the column centering and scaling of a matrix.
DeferredMatrix class provides a matrix representation that is equivalent to the output of
scale(), but without actually performing the centering/scaling operations.
This is useful when dealing with sparse matrix representations, as naive centering would result in loss of sparsity and increased memory use.
library(Matrix) a <- rsparsematrix(10000, 1000, density=0.01) object.size(a) centering <- rnorm(ncol(a)) scaling <- runif(ncol(a)) y <- DeferredMatrix(a, centering, scaling) y object.size(y) # 'a' plus the size of 'centering' and 'scaling'.
At first glance, this seems to be nothing more than a variant on a
However, the real advantage of the
DeferredMatrix comes when performing matrix multiplication.
Multiplication is applied to the underlying sparse matrix, and the effects of centering and scaling are applied on the result.
This allows us to achieve much greater speed than
r Biocpkg("DelayedArray")'s block-processing mechanism,
which realizes blocks of the matrix into dense arrays prior to multiplication.
v <- matrix(runif(ncol(a)*2), ncol=2) system.time(X <- y %*% v) X
The cost of this speed is that the resulting matrix product is less numerically stable. Recovering the effect of centering involves the subtraction of two (potentially large) inner products, which increases the risk of catastrophic cancellation. Nonetheless, some reduction in accuracy may be worth the major speed-up when dense realization is avoided.
Note that it is also possible to nest
DeferredMatrix objects within each other.
This allows users to center and scale on both dimensions, as shown below.
centering2 <- rnorm(nrow(a)) scaling2 <- runif(nrow(a)) y2 <- DeferredMatrix(t(a), centering2, scaling2) # centering on rows of 'a'. y3 <- DeferredMatrix(t(y2), centering, scaling) # centering on columns. y3
Other operations will cause the
DeferredMatrix to collapse gracefully into
DelayedMatrix, leading to block processing for further calculations.
Once a PCA is performed, it is occasionally desirable to obtain a low-rank approximation of the input matrix by taking the cross-product of the rotation vectors and PC scores.
Naively doing so results in the formation of a dense matrix of the same dimensions as the input.
This may be prohibitively memory-consuming for a large data set.
Instead, we can construct a
LowRankMatrix class that mimics the output of the cross-product without actually computing it.
a <- rsparsematrix(10000, 1000, density=0.01) out <- runPCA(a, rank=5, BSPARAM=IrlbaParam(deferred=TRUE)) # deferring for speed. recon <- LowRankMatrix(out$rotation, out$x) recon
This is useful for convenient extraction of row- or column vectors without needing to manually perform a cross-product.
LowRankMatrix is thus directly interoperable with downstream procedures (e.g., for visualization) that expect a matrix of the same dimensionality as the input.
Again, most operations will cause the
LowRankMatrix to collapse gracefully into
DelayedMatrix for further processing.
This has now been moved to its own
r Biocpkg("ResidualMatrix") package.
Check it out, it's pretty cool.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.