Generalized Principal Component Analysis

Share:

Description

Computes the rank K Generalized PCA (GPCA) solution.

Usage

1
gpca(X, Q, R, K, deflation=FALSE)

Arguments

X

The n x p data matrix. X must be of class matrix with all numeric values.

Q

The row generalizing operator, an n x n matrix. Q can be of class matrix or class dcGMatrix, but the function is optimized for sparse matrices of class dCGMatrix. Q must also be positive semi-definite and be scaled to have operator norm one.

R

The column generalizing operator, an p x p matrix. R can be of class matrix or class dcGMatrix, but the function is optimized for sparse matrices of class dCGMatrix. R must also be positive semi-definite and be scaled to have operator norm one.

K

The number of GPCA components to compute. The default value is one.

deflation

Algorithm used to calculate the solution. Default is deflation=FALSE and most users should not deviate from this option. See details.

Details

The Generalized PCA solution maximizes the sample variance of the data in an inner-product space induced by the row and column generalizing operators, Q and R, and also finds the best low-rank approximation to the data as measured by a generalization of the Frobenius norm. Note that the resulting GPCA factors U and V are orthogonal with respect to the row and column generalizing operators: U^T Q U = I and V^T R V = I. Generalized PCA can be interpreted as finding major modes of variation that are independent from the generalizing operators. Thus, if Q and R encode noise structures (see laplacian) or noise covariances (see Exp.cov), then GPCA finds patterns separate from the structure of the noise.

The generalizing operators, Q and R, must be positive semi-definite and have operator norm one. Note that if these are the identity matrix, then GPCA is equivalent to PCA and gpca returns the SVD of X. Smoothers, such as covariances (see Exp.cov,Exp.simple.cov,Rad.cov, stationary.cov,cubic.cov,stationary.taper.cov, wendland.cov), and inverse smoothers (see laplacian) can be used as generalizing operators for data in which variables are associated with a specific location (e.g. image data and spatio-temporal data).

This function has the option of using one of two algorithms to compute the solution. The deflation = FALSE option computes the eigen-decomposition of a quadratic form of dimension min(n,p) to find U or V and finds the other factor by regression. The deflation = TRUE option finds each factor using the generalized power algorithm and performs to deflation to compute multiple factors. The deflation = FALSE option is generally faster, and especially so when one dimension is much smaller than the other. The option deflation = TRUE is faster only if both dimensions are large n,p > 5,000 and K is small.

Value

U

The left GPCA factors, an n x K matrix.

V

The right GPCA factors, an p x K matrix.

D

A vector of the K PCA values.

cumm.prop.var

Cumulative proportion of variance explained by the first K components.

prop.var

Proportion of variance explained by each component.

Author(s)

Frederick Campbell

References

Genevera I. Allen, Logan Grosenick, and Jonathan Taylor, "A generalized least squares matrix decomposition", arXiv:1102.3074, 2011.

See Also

laplacian, Exp.cov, sgpca

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(ozone2)
ind = which(apply(is.na(ozone2$y),2,sum)==0)
X = ozone2$y[,ind]
n = nrow(X)
p = ncol(X)

#Generalizing Operators - Spatio-Temporal Smoothers
R = Exp.cov(ozone2$lon.lat[ind,],theta=5)
er = eigen(R,only.values=TRUE);
R = R/max(er$values)
Q = Exp.cov(c(1:n),c(1:n),theta=3)
eq = eigen(Q,only.values=TRUE)
Q = Q/max(eq$values)

#SVD
fitsvd = gpca(X,diag(n),diag(p),1)

#GPCA
fitgpca = gpca(X,Q,R,1)
fitgpca$prop.var #proportion of variance explained