pca_svd | R Documentation |
Algorithms implementing a centered PCA of a matrix X
.
Except for pca_nipalsna
, a priori weights can be set to the observations (rows of X
), with argument weights
. This modifies the importance given to each of the n
observations in the calculations of the scores and loadings. By default, argument weights
is set to NULL
corresponding to the usual weights 1/n
.
Noting D
a n x n
diagonal matrix of weights for the observations (rows of X
), the functions consist in:
- pca_svd
==> SVD decomposition of D^(1/2) * X
, using function svd
.
- pca_eigen
==> eigen decomposition of X' * D * X
, using function eigen
.
- pca_eigenk
==> eigen decomposition of D^(1/2) * X * X' D^(1/2)
, using function eigen
. This is the "kernel cross-product trick" version of the PCA algorithm (Wu et al. 1997). For wide matrices (n << p
) and n
not too large, this algorithm can be much faster than the others.
- pca_nipals
==> eigen decomposition of X' * D * X
using NIPALS.
- pca_nipalsna
==> eigen decomposition of X' * D * X
using NIPALS allowing missing data in X
.
In all the functions, matrix X
is centered before the analyses, but X
is not column-wise scaled (there is no argument scale
available). If a scaling is needed, the user has to scale X
before using the functions.
Function pca_nipalsna
accepts missing data (NA
s) in X
, unlike the other functions. The part of pca_nipalsna
accounting specifically for missing missing data in the NIPALS algorithm is based on the efficient code of K. Wright in the R package nipals
(https://cran.r-project.org/web/packages/nipals/index.html).
Gram-Schmidt orthogonalization in the NIPALS algorithm
The PCA NIPALS is known generating a loss of orthogonality of the scores (due to the accumulation of rounding errors in the successive iterations), particularly for large matrices or with high degrees of column collinearity.
With missing data, orthogonality of loadings is not satisfied neither.
A performant approach for coming back to orthogonality (scores and loadings) is the iterative classical Gram-Schmidt orthogonalization (Lingen 2000, Andrecut 2009, and vignette of R package nipals
), referred to as the iterative CGS. It consists in adding a CGS orthorgonalization step in each iteration of the scores and loadings calculations.
For the case with missing data, the iterative CGS does not insure that the orthogonalized scores are centered.
pca_svd(X, ncomp, weights = NULL)
pca_eigen(X, ncomp, weights = NULL)
pca_eigenk(X, ncomp, weights = NULL)
pca_nipals(X, ncomp, weights = NULL,
gs = TRUE,
tol = .Machine$double.eps^0.5, maxit = 200)
pca_nipalsna(X, ncomp,
gs = TRUE,
tol = .Machine$double.eps^0.5, maxit = 200)
X |
A matrix or dataframe ( |
ncomp |
The maximal number of PCA scores (= components) to be calculated. |
weights |
A vector of length |
Specific arguments for the NIPALS
gs |
For |
tol |
Tolerance for testing convergence of the NIPALS iterations for each principal component. |
maxit |
Maximum number of NIPALS iterations for each principal component. |
A list of outputs, such as:
T |
The score matrix ( |
P |
The loadings matrix ( |
R |
The projection matrix (= |
sv |
The singular values (vector of length |
eig |
The eigenvalues ( |
xmeans |
The centering vector of |
niter |
Numbers of iterations of the NIPALS. |
conv |
Logical indicating if the NIPALS converged before reaching the maximal number of iterations. |
Andrecut, M., 2009. Parallel GPU Implementation of Iterative PCA Algorithms. Journal of Computational Biology 16, 1593-1599. https://doi.org/10.1089/cmb.2008.0221
Gabriel, R. K., 2002. Le biplot - Outil d'exploration de données multidimensionnelles. Journal de la Société Française de la Statistique, 143, 5-55.
Lingen, F.J., 2000. Efficient GramâSchmidt orthonormalisation on parallel computers. Communications in Numerical Methods in Engineering 16, 57-66. https://doi.org/10.1002/(SICI)1099-0887(200001)16:1<57::AID-CNM320>3.0.CO;2-I
Tenenhaus, M., 1998. La régression PLS: théorie et pratique. Editions Technip, Paris, France.
Wright, K., 2018. Package nipals: Principal Components Analysis using NIPALS with Gram-Schmidt Orthogonalization. https://cran.r-project.org/
Wu, W., Massart, D.L., de Jong, S., 1997. The kernel PCA algorithms for wide data. Part I: Theory and algorithms. Chemometrics and Intelligent Laboratory Systems 36, 165-172. https://doi.org/10.1016/S0169-7439(97)00010-5
n <- 6
p <- 4
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), nrow = n,)
set.seed(NULL)
s <- c(3, 4, 7, 10, 11, 15, 21:24)
zX <- replace(X, s, NA)
X
zX
pca_svd(X, ncomp = 3)
pca_eigen(X, ncomp = 3)
pca_nipals(X, ncomp = 3)
pca_nipalsna(X, ncomp = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.