# gpca: Generalized Principal Component Analysis In sGPCA: Sparse Generalized Principal Component Analysis

## Description

Computes the rank `K` Generalized PCA (GPCA) solution.

## Usage

 `1` ```gpca(X, Q, R, K, deflation=FALSE) ```

## Arguments

 `X` The `n x p` data matrix. `X` must be of class `matrix` with all numeric values. `Q` The row generalizing operator, an `n x n` matrix. `Q` can be of class `matrix` or class `dcGMatrix`, but the function is optimized for sparse matrices of class `dCGMatrix`. `Q` must also be positive semi-definite and be scaled to have operator norm one. `R` The column generalizing operator, an `p x p` matrix. `R` can be of class `matrix` or class `dcGMatrix`, but the function is optimized for sparse matrices of class `dCGMatrix`. `R` must also be positive semi-definite and be scaled to have operator norm one. `K` The number of GPCA components to compute. The default value is one. `deflation` Algorithm used to calculate the solution. Default is `deflation=FALSE` and most users should not deviate from this option. See details.

## Details

The Generalized PCA solution maximizes the sample variance of the data in an inner-product space induced by the row and column generalizing operators, `Q` and `R`, and also finds the best low-rank approximation to the data as measured by a generalization of the Frobenius norm. Note that the resulting GPCA factors `U` and `V` are orthogonal with respect to the row and column generalizing operators: ` U^T Q U = I` and `V^T R V = I`. Generalized PCA can be interpreted as finding major modes of variation that are independent from the generalizing operators. Thus, if `Q` and `R` encode noise structures (see `laplacian`) or noise covariances (see `Exp.cov`), then GPCA finds patterns separate from the structure of the noise.

The generalizing operators, `Q` and `R`, must be positive semi-definite and have operator norm one. Note that if these are the identity matrix, then GPCA is equivalent to PCA and `gpca` returns the SVD of `X`. Smoothers, such as covariances (see `Exp.cov`,`Exp.simple.cov`,`Rad.cov`, `stationary.cov`,`cubic.cov`,`stationary.taper.cov`, `wendland.cov`), and inverse smoothers (see `laplacian`) can be used as generalizing operators for data in which variables are associated with a specific location (e.g. image data and spatio-temporal data).

This function has the option of using one of two algorithms to compute the solution. The `deflation = FALSE` option computes the eigen-decomposition of a quadratic form of dimension `min(n,p)` to find `U` or `V` and finds the other factor by regression. The `deflation = TRUE` option finds each factor using the generalized power algorithm and performs to deflation to compute multiple factors. The ```deflation = FALSE``` option is generally faster, and especially so when one dimension is much smaller than the other. The option `deflation = TRUE` is faster only if both dimensions are large `n,p > 5,000` and `K` is small.

## Value

 `U` The left GPCA factors, an `n x K` matrix. `V` The right GPCA factors, an `p x K` matrix. `D` A vector of the `K` PCA values. `cumm.prop.var` Cumulative proportion of variance explained by the first `K` components. `prop.var` Proportion of variance explained by each component.

## Author(s)

Frederick Campbell

## References

Genevera I. Allen, Logan Grosenick, and Jonathan Taylor, "A generalized least squares matrix decomposition", arXiv:1102.3074, 2011.

`laplacian`, `Exp.cov`, `sgpca`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```data(ozone2) ind = which(apply(is.na(ozone2\$y),2,sum)==0) X = ozone2\$y[,ind] n = nrow(X) p = ncol(X) #Generalizing Operators - Spatio-Temporal Smoothers R = Exp.cov(ozone2\$lon.lat[ind,],theta=5) er = eigen(R,only.values=TRUE); R = R/max(er\$values) Q = Exp.cov(c(1:n),c(1:n),theta=3) eq = eigen(Q,only.values=TRUE) Q = Q/max(eq\$values) #SVD fitsvd = gpca(X,diag(n),diag(p),1) #GPCA fitgpca = gpca(X,Q,R,1) fitgpca\$prop.var #proportion of variance explained ```