sgpca: Sparse Generalized Principal Component Analysis
In sGPCA: Sparse Generalized Principal Component Analysis

Description Usage Arguments Details Value Author(s) References Examples

View source: R/sgpca.R

Computes the rank K sparse, sparse non-negative, two-way sparse, and two-way sparse non-negative GPCA solutions.

1 2	sgpca(X, Q, R, K = 1, lamu = 0, lamvs = 0, posu = FALSE, posv = FALSE, threshold = 1e-07, maxit = 1000, full.path = FALSE)

`X`	The `n x p` data matrix. `X` must be of class `matrix` with all numeric values.
`Q`	The row generalizing operator, an `n x n` matrix. `Q` can be of class `matrix` or class `dcGMatrix`, must be positive semi-definite, and have operator norm one.
`R`	The column generalizing operator, an `p x p` matrix. `R` can be of class `matrix` or class `dcGMatrix`, must be positive semi-definite, and have operator norm one.
`K`	The number of GPCA components to compute. The default value is one.
`lamu`	The regularization parameter that determines the sparsity level for the row factor, `U`. The default value is 0. If the data is oriented with rows as samples, non-zero `lamu` corresponds to two-way sparse methods.
`lamvs`	A scalar or vector of regularization parameters that determine the sparsity level for the column factor, `V`. The default is 0, with non-zero values corresponding to sparse or two-way sparse methods. If `lamvs` is a vector, then the BIC method is used to select the optimal sparsity level. Alternatively, if `full.path` is specified, then the solution at each value of `lamvs` is returned.
`posu`	Flag indicating whether the row factor, `U` should be constrained to be strictly positive. The default value is FALSE.
`posv`	Flag indicating whether the column factor, `V` should be constrained to be strictly positive. The default value is FALSE.
`threshold`	Sets the threshold for convergence. The default value is `.0001`.
`maxit`	Sets the maximum number of iterations. The default value is `.0001`.
`full.path`	Flag indicating whether the entire solution path, or the solution at each value of `lamvs`, should be returned. The default value is FALSE.

The sgpca function has the flexibility to fit combinations of sparsity and/or non-negativity for both the row and column generalized PCs. Regularization is used to encourage sparsity in the GPCA factors by placing an L1 penalty on the GPC loadings, V, and or the sample GPCs, U. Non-negativity constraints on V and/or U yield sparse non-negative and two-way non-negative GPCA. Generalizing operators as described for gpca can be used with this function and have the same properties.

When lamvs=0, lamu=0, posu=0, and posv=0, the GPCA solution also given by gpca is returned. The magnitude of the regularization parameters, lamvs and lamu, determine the level of sparsity of the factors U and V, with higher regularization parameter values yielding sparser factors. If more than one regularization value lamvs is given, then sgpca finds the optimal regularization parameter lamvs by minimizing the BIC derived from the generalized least squares update for each factor.

If full.path = TRUE, then the full path of solutions (U, D, and V) is returned for each value of lamvs given. This option is best used with 50 or 100 values of lamvs to well approximate the regularization paths. Numerically, the path begins with the GPCA solution, lamvs=0, and uses warm starts at each step as lamvs increases.

Proximal gradient descent is used to compute each rank-one solution. Multiple components are calculated in a greedy manner via deflation. Each rank-one solution is solved by iteratively fitting generalized least squares problems with penalties or non-negativity constraints. These regression problems are solved by the Iterative Soft-Thresholding Algorithm (ISTA) or projected gradient descent.

`U`	The left sparse GPCA factors, an `n x K` matrix. If `full.path` is specified with `r` values of `lamvs`, then `U` is a `n x K x r` array.
`V`	The right sparse GPCA factors, a `p x K` matrix. If `full.path` is specified with `r` values of `lamvs`, then `V` is a `p x K x r` array.
`D`	A vector of the K sparse GPCA values. If `full.path` is specified with `r` values of `lamvs`, then `D` is a `K x r` matrix.
`cumulative.prop.var`	The cumulative proportion of variance explained by the components
`bics`	The BIC values computed for each value of `lamvs` and each of the `K` components.
`optlams`	Optimal regularization parameter as chosen by the BIC method for each of the `K` components.

Frederick Campbell

Genevera I. Allen, Logan Grosenick, and Jonathan Taylor, "A generalized least squares matrix decomposition", arXiv:1102.3074, 2011.

Genevera I. Allen and Mirjana Maletic-Savatic, "Sparse Non-negative Generalized PCA with Applications to Metabolomics", Bioinformatics, 27:21, 3029-3035, 2011.

data(ozone2)
ind = which(apply(is.na(ozone2$y),2,sum)==0)
X = ozone2$y[,ind]
n = nrow(X)
p = ncol(X)
#Generalizing Operators - Spatio-Temporal Smoothers
R = Exp.cov(ozone2$lon.lat[ind,],theta=5)
er = eigen(R,only.values=TRUE);
R = R/max(er$values)
Q = Exp.cov(c(1:n),c(1:n),theta=3)
eq = eigen(Q,only.values=TRUE)
Q = Q/max(eq$values)

#Sparse GPCA
fit = sgpca(X,Q,R,K=1,lamu=0,lamvs=c(.5,1))
fit$prop.var #proportion of variance explained
fit$optlams #optimal regularization param chosen by BIC
fit$bics #BIC values for each lambda

#Sparse Non-negative GPCA
fit = sgpca(X,Q,R,K=1,lamu=0,lamvs=1,posv=TRUE)

#Two-way Sparse GPCA
fit = sgpca(X,Q,R,K=1,lamu=1,lamvs=1)

#Two-way Sparse Non-negative GPCA
fit = sgpca(X,Q,R,K=1,lamu=1,lamvs=1,posu=TRUE,posv=TRUE)

#Return full regularization paths for inputted lambda values
fit = sgpca(X,Q,R,K=1,lamu=0,lamvs=c(.1,.5,1),full.path=TRUE)

Loading required package: Matrix
Loading required package: fields
Loading required package: spam
Loading required package: dotCall64
Loading required package: grid
Spam version 2.1-1 (2017-07-02) is loaded.
Type 'help( Spam)' or 'demo( spam)' for a short introduction 
and overview of this package.
Help for individual functions is also obtained by adding the
suffix '.spam' to the function name, e.g. 'help( chol.spam)'.

Attaching package: 'spam'

The following objects are masked from 'package:base':

    backsolve, forwardsolve

Loading required package: maps
NULL
[1] 0.5
[1] 1.727363