rpca: Randomized principal component analysis (rpca).
In erichson/rSVD: Randomized Singular Value Decomposition

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/rpca.R

Fast computation of the principal components analysis using the randomized singular value decomposition.

rpca(
  A,
  k = NULL,
  center = TRUE,
  scale = TRUE,
  retx = TRUE,
  p = 10,
  q = 2,
  rand = TRUE
)

`A`	array_like; a numeric (m, n) input matrix (or data frame) to be analyzed. If the data contain NAs na.omit is applied.
`k`	integer; number of dominant principle components to be computed. It is required that k is smaller or equal to min(m,n), but it is recommended that k << min(m,n).
`center`	bool, optional; logical value which indicates whether the variables should be shifted to be zero centered (TRUE by default).
`scale`	bool, optional; logical value which indicates whether the variables should be scaled to have unit variance (TRUE by default).
`retx`	bool, optional; logical value indicating whether the rotated variables / scores should be returned (TRUE by default).
`p`	integer, optional; oversampling parameter for rsvd (default p=10), see `rsvd`.
`q`	integer, optional; number of additional power iterations for rsvd (default q=1), see `rsvd`.
`rand`	bool, optional; if (TRUE), the rsvd routine is used, otherwise svd is used.

Principal component analysis is an important linear dimension reduction technique.

Randomized PCA is computed via the randomized SVD algorithm (rsvd). The computational gain is substantial, if the desired number of principal components is relatively small, i.e. k << min(m,n).

The print and summary method can be used to present the results in a nice format. A scree plot can be produced with ggscreeplot. The individuals factor map can be produced with ggindplot, and a correlation plot with ggcorplot.

The predict function can be used to compute the scores of new observations. The data will automatically be centered (and scaled if requested). This is not fully supported for complex input matrices.

rpca returns a list with class rpca containing the following components:

rotation: array_like;
the rotation (eigenvectors); (n, k) dimensional array.
eigvals: array_like;
eigenvalues; k dimensional vector.
sdev: array_like;
standard deviations of the principal components; k dimensional vector.
x: array_like;
the scores / rotated data; (m, k) dimensional array.
center, scale: array_like;
the centering and scaling used.

The principal components are not unique and only defined up to sign (a constant of modulus one in the complex case) and so may differ between different PCA implementations.

Similar to prcomp the variances are computed with the usual divisor N - 1.

N. Benjamin Erichson, erichson@berkeley.edu

[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. http://doi.org/10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv http://arxiv.org/abs/0909.4061).

ggscreeplot, ggindplot, ggcorplot, plot.rpca, predict, rsvd

library('rsvd')
#
# Load Edgar Anderson's Iris Data
#
data('iris')

#
# log transform
#
log.iris <- log( iris[ , 1:4] )
iris.species <- iris[ , 5]

#
# Perform rPCA and compute only the first two PCs
#
iris.rpca <- rpca(log.iris, k=2)
summary(iris.rpca) # Summary
print(iris.rpca) # Prints the rotations

#
# Use rPCA to compute all PCs, similar to \code{\link{prcomp}}
#
iris.rpca <- rpca(log.iris)
summary(iris.rpca) # Summary
print(iris.rpca) # Prints the rotations
plot(iris.rpca) # Produce screeplot, variable and individuls factor maps.