# robspca: Robust Sparse Principal Component Analysis (robspca). In erichson/spca: Sparse Principal Component Analysis (SPCA)

## Description

Implementation of robust SPCA, using variable projection as an optimization strategy.

## Usage

 1 2 3 robspca(X, k = NULL, alpha = 1e-04, beta = 1e-04, gamma = 100, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, verbose = TRUE) 

## Arguments

 X array_like; a real (n, p) input matrix (or data frame) to be decomposed. k integer; specifies the target rank, i.e., the number of components to be computed. alpha float; Sparsity controlling parameter. Higher values lead to sparser components. beta float; Amount of ridge shrinkage to apply in order to improve conditioning. gamma float; Sparsity controlling parameter for the error matrix S. Smaller values lead to a larger amount of noise removeal. center bool; logical value which indicates whether the variables should be shifted to be zero centered (TRUE by default). scale bool; logical value which indicates whether the variables should be scaled to have unit variance (FALSE by default). max_iter integer; maximum number of iterations to perform before exiting. tol float; stopping tolerance for the convergence criterion. verbose bool; logical value which indicates whether progress is printed.

## Details

Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach leads to an improved interpretability of the model, because the principal components are formed as a linear combination of only a few of the original variables. Further, SPCA avoids overfitting in a high-dimensional data setting where the number of variables p is greater than the number of observations n.

Such a parsimonious model is obtained by introducing prior information like sparsity promoting regularizers. More concreatly, given an (n,p) data matrix X, robust SPCA attemps to minimize the following objective function:

f(A,B) = \frac{1}{2} \| X - X B A^\top - S \|^2_F + ψ(B) + γ \|S\|_1

where B is the sparse weight matrix (loadings) and A is an orthonormal matrix. ψ denotes a sparsity inducing regularizer such as the LASSO (l1 norm) or the elastic net (a combination of the l1 and l2 norm). The matrix S captures grossly corrupted outliers in the data.

The principal components Z are formed as

Z = X B

and the data can be approximately rotated back as

Xtilde = Z t(A)

The print and summary method can be used to present the results in a nice format.

## Value

spca returns a list containing the following three components:

 loadings array_like; sparse loadings (weight) vector; (p, k) dimensional array. transform array_like; the approximated inverse transform; (p, k) dimensional array. scores array_like; the principal component scores; (n, k) dimensional array. sparse array_like; sparse matrix capturing outliers in the data; (n, p) dimensional array. eigenvalues array_like; the approximated eigenvalues; (k) dimensional array. center, scale array_like; the centering and scaling used.

## Author(s)

N. Benjamin Erichson, Peng Zheng, and Sasha Aravkin

## References

• [1] N. B. Erichson, P. Zheng, K. Manohar, S. Brunton, J. N. Kutz, A. Y. Aravkin. "Sparse Principal Component Analysis via Variable Projection." Submitted to IEEE Journal of Selected Topics on Signal Processing (2018). (available at 'arXiv https://arxiv.org/abs/1804.00341).

rspca, spca
  1 2 3 4 5 6 7 8 9 10 11 12 13 # Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- robspca(X, k=3, alpha=1e-3, beta=1e-5, gamma=5, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)