Eigenvalue decomposition is a commonly used technique in numerous statistical problems. For example, principal component analysis (PCA) basically conducts eigenvalue decomposition on the sample covariance of a data matrix: the eigenvalues are the component variances, and eigenvectors are the variable loadings.
In R, the standard way to compute eigenvalues is the eigen()
function.
However, when the matrix becomes large, eigen()
can be very time-consuming:
the complexity to calculate all eigenvalues of a $n \times n$ matrix is
$O(n^3)$.
While in real applications, we usually only need to compute a few
eigenvalues or eigenvectors, for example to visualize high dimensional
data using PCA, we may only use the first two or three components to draw
a scatterplot. Unfortunately in eigen()
, there is no option to limit the
number of eigenvalues to be computed. This means that we always need to do the
full eigen decomposition, which can cause a huge waste in computation.
The same thing happens in Singular Value Decomposition (SVD). It is often the case that only a Partial SVD or Truncated SVD is needed, and moreover the matrix is usually stored in sparse format. Base R does not provide functions suitable for these special needs.
And this is why the RSpectra package was developed. RSpectra provides an R interface to the Spectra library, which is used to solve large scale eigenvalue problems. The core part of Spectra and RSpectra was written in efficient C++ code, and they can handle large scale matrices in either dense or sparse formats.
The RSpectra package provides functions eigs()
and eigs_sym()
to
calculate eigenvalues of general and symmetric matrices respectively.
If the matrix is known to be symmetric,
eigs_sym()
is preferred since it guarantees that the eigenvalues are real.
To obtain eigenvalues of a square matrix A
, simply call the eigs()
or
eigs_sym()
function, tell it how many eigenvalues are requested (argument k
),
and which ones are going to be selected (argument which
).
By default, which = "LM"
means to pick the eigenvalues
with the largest magnitude (modulus for complex numbers and absolute value
for real numbers).
Below we first generate some random matrices and display some of the eigenvalues and eigenvectors:
set.seed(123) n = 100 # matrix size k = 5 # number of eigenvalues to calculate # Some random data M = matrix(rnorm(n^2), n) # Make it symmetric A = crossprod(M) # Show its largest 5 eigenvalues and associated eigenvectors head(eigen(A)$values, 5) head(eigen(A)$vectors[, 1:5])
Now we use RSpectra to directly obtain the largest 5 eigenvalues:
library(RSpectra) res = eigs_sym(A, k, which = "LM") # "LM" is the default res$values head(res$vectors)
If only eigenvalues are requested, the retvec
option can be used:
eigs_sym(A, k, opts = list(retvec = FALSE))
For really large data, the matrix is usually stored in sparse format.
RSpectra supports the dgCMatrix
and dgRMatrix
matrix types defined
in the Matrix package.
library(Matrix) Msp = as(M, "dgCMatrix") Asp = as(A, "dgRMatrix") eigs(Msp, k, which = "LR", opts = list(retvec = FALSE))$values # largest real part eigs_sym(Asp, k, opts = list(retvec = FALSE))$values
An even more general way to specify the matrix A
is to define a function that
calculates A %*% x
for any vector x
. This representation is called the
Function Interface, which can also be regarded as a sparse format, since
users do not need to store the matrix A
, but only the operator x -> Ax
.
For example, to represent a diagonal matrix $diag(1, 2, \ldots, 10)$ and
calculate its eigenvalues, we can define the function f
and call the
eigs_sym()
function:
# Implicitly define the matrix by a function that calculates A %*% x # Below represents a diagonal matrix whose elements are stored in the `args` parameter f = function(x, args) { # diag(args) %*% x == x * args return(x * args) } eigs_sym(f, k = 3, n = 10, args = 1:10)
n
gives the dimension of the matrix, and args
contains user data that will
be passed to f
.
Sometimes you may need to calculate the smallest (in magnitude) k
eigenvalues
of a matrix. A direct way to do so is to use eigs(..., which = "SM")
.
However, the algorithm of RSpectra is good at finding large
eigenvalues rather than small ones, so which = "SM"
tends to require much
more iterations or even may not converge.
The recommended way to retrieve the smallest eigenvalues is to utilize the spectral transformation: If $A$ has eigenvalues $\lambda_i$, then $(A-\sigma I)^{-1}$ has eigenvalues $1/(\lambda_i-\sigma)$. Therefore, we can set $\sigma = 0$ and calculate the largest eigenvalues of $A^{-1}$, whose reciprocals are exactly the smallest eigenvalues of $A$.
eigs()
implements the spectral transformation via the parameter sigma
, so
the following code returns the smallest eigenvalues of A
. Note that
eigs()
always returns eigenvalues in the original scale (i.e., $\lambda_i$
instead of $1/(\lambda_i-\sigma)$).
eigs_sym(A, k, which = "LM", sigma = 0)$values # recommended way eigs_sym(A, k, which = "SM")$values # not recommended
More generally, the option which = "LM", sigma = s
selects eigenvalues
that are closest to s
.
Truncated SVD (or Partial SVD) is frequently used in text mining and image compression, which computes the leading singular values and singular vectors of a rectangular matrix.
RSpectra has the svds()
function to compute Truncated SVD:
set.seed(123) m = 100 n = 20 k = 5 A = matrix(rnorm(m * n), m) str(svds(A, k, nu = k, nv = k))
Similar to eigs()
, svds()
supports sparse matrices:
Asp = as(A, "dgCMatrix") svds(Asp, k, nu = 0, nv = 0)
svds()
also has the function interface: users provide functions A
and Atrans
that calcualtes $Ax$ and $A'x$ respectively, and svds()
will
compute the Truncated SVD.
Af = function(x, args) as.numeric(args %*% x) Atransf = function(x, args) as.numeric(crossprod(args, x)) str(svds(Af, k, Atrans = Atransf, dim = c(m, n), args = Asp))
RSpectra is written in efficient C++ code.
This page gives some
benchmarking results of svds()
with other similar functions available in R
and extension packages.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.