Description Usage Arguments Details Value Note Author(s) References See Also Examples
Randomized accelerated implementation of SPCA, using variable projection as an optimization strategy.
1 2 3 |
X |
array_like; |
k |
integer; |
alpha |
float; |
beta |
float; |
center |
bool; |
scale |
bool; |
max_iter |
integer; |
tol |
float; |
o |
integer; |
q |
integer; |
verbose |
bool; |
Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach leads to an improved interpretability of the model, because the principal components are formed as a linear combination of only a few of the original variables. Further, SPCA avoids overfitting in a high-dimensional data setting where the number of variables p is greater than the number of observations n.
Such a parsimonious model is obtained by introducing prior information like sparsity promoting regularizers. More concreatly, given an (n,p) data matrix X, SPCA attemps to minimize the following objective function:
f(A,B) = \frac{1}{2} \| X - X B A^\top \|^2_F + ψ(B)
where B is the sparse weight (loadings) matrix and A is an orthonormal matrix. ψ denotes a sparsity inducing regularizer such as the LASSO (l1 norm) or the elastic net (a combination of the l1 and l2 norm). The principal components Z are formed as
Z = X B
and the data can be approximately rotated back as
Xtilde = Z t(A)
The print and summary method can be used to present the results in a nice format.
spca
returns a list containing the following three components:
loadings |
array_like; |
transform |
array_like; |
scores |
array_like; |
eigenvalues |
array_like; |
center, scale |
array_like; |
This implementation uses randomized methods for linear algebra to speedup the computations. o is an oversampling parameter to improve the approximation. A value of at least 10 is recommended, and o=20 is set by default.
The parameter q specifies the number of power (subspace) iterations to reduce the approximation error. The power scheme is recommended, if the singular values decay slowly. In practice, 2 or 3 iterations achieve good results, however, computing power iterations increases the computational costs. The power scheme is set to q=2 by default.
If k > (min(n,p)/4), a the deterministic spca
algorithm might be faster.
N. Benjamin Erichson, Peng Zheng, and Sasha Aravkin
[1] N. B. Erichson, P. Zheng, K. Manohar, S. Brunton, J. N. Kutz, A. Y. Aravkin. "Sparse Principal Component Analysis via Variable Projection." Submitted to IEEE Journal of Selected Topics on Signal Processing (2018). (available at 'arXiv https://arxiv.org/abs/1804.00341).
[1] N. B. Erichson, S. Voronin, S. Brunton, J. N. Kutz. "Randomized matrix decompositions using R." Submitted to Journal of Statistical Software (2016). (available at 'arXiv http://arxiv.org/abs/1608.02148).
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Create artifical data
m <- 10000
V1 <- rnorm(m, 0, 290)
V2 <- rnorm(m, 0, 300)
V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100)
X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3)
X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X))
# Compute SPCA
out <- rspca(X, k=3, alpha=1e-3, beta=1e-3, center = TRUE, scale = FALSE, verbose=0)
print(out)
summary(out)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.