Generalized Principal Component Analysis
Computes the rank
K Generalized PCA (GPCA) solution.
The row generalizing operator, an
The column generalizing operator, an
The number of GPCA components to compute. The default value is one.
Algorithm used to calculate the solution. Default is
The Generalized PCA solution maximizes the sample variance of the data
in an inner-product space induced by the row and column generalizing
R, and also finds the best low-rank
approximation to the data as
measured by a generalization of the Frobenius norm. Note that the
resulting GPCA factors
V are orthogonal with
respect to the row and column generalizing operators:
U^T Q U = I
V^T R V = I. Generalized PCA can be interpreted as finding
major modes of variation that are independent from the generalizing
operators. Thus, if
R encode noise structures
laplacian) or noise covariances (see
then GPCA finds patterns separate from the structure of the noise.
The generalizing operators,
R, must be positive
semi-definite and have operator norm one. Note that if these are the
identity matrix, then GPCA is equivalent to PCA and
the SVD of
X. Smoothers, such as covariances (see
wendland.cov), and inverse smoothers (see
can be used as generalizing operators for data in which variables are associated
with a specific location (e.g. image data and spatio-temporal data).
This function has the option of using one of two algorithms to compute
the solution. The
deflation = FALSE option computes the
of a quadratic form of dimension
min(n,p) to find
V and finds the other factor by regression. The
deflation = TRUE
option finds each factor using the generalized power algorithm and
performs to deflation to compute multiple factors. The
option is generally faster, and especially so when one dimension is much
smaller than the other. The option
deflation = TRUE is faster only
if both dimensions are large
n,p > 5,000 and
The left GPCA factors, an
The right GPCA factors, an
A vector of the
Cumulative proportion of variance explained by
Proportion of variance explained by each component.
Genevera I. Allen, Logan Grosenick, and Jonathan Taylor, "A generalized least squares matrix decomposition", arXiv:1102.3074, 2011.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
data(ozone2) ind = which(apply(is.na(ozone2$y),2,sum)==0) X = ozone2$y[,ind] n = nrow(X) p = ncol(X) #Generalizing Operators - Spatio-Temporal Smoothers R = Exp.cov(ozone2$lon.lat[ind,],theta=5) er = eigen(R,only.values=TRUE); R = R/max(er$values) Q = Exp.cov(c(1:n),c(1:n),theta=3) eq = eigen(Q,only.values=TRUE) Q = Q/max(eq$values) #SVD fitsvd = gpca(X,diag(n),diag(p),1) #GPCA fitgpca = gpca(X,Q,R,1) fitgpca$prop.var #proportion of variance explained
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.