SupSFPCA: Supervised Sparse and Functional Principal Component Analysis
In SuperPCA: Supervised Principal Component Analysis

Description Usage Arguments Value Examples

View source: R/SupSFPCA.r

This function conducts supervised sparse and functional principal component analysis by fitting the SupSVD model X=UV' + E U=YB + F where X is an observed primary data matrix (to be decomposed), U is a latent score matrix, V is a loading matrix, E is measurement noise, Y is an observed auxiliary supervision matrix, B is a coefficient matrix, and F is a random effect matrix.

It decomposes the primary data matrix X into low-rank components, while taking into account many different features: 1) potential supervision from any auxiliary data Y measured on the same samples; 2) potential smoothness for loading vectors V (for functional data); 3) sparsity in supervision coefficients B and loadings V (for variable selection).

It is a very general dimension reduction method that subsumes PCA, sparse PCA, functional PCA, supervised PCA, etc as special cases. See more details in 2016 JCGS paper "Supervised sparse and functional principal component analysis" by Gen Li, Haipeng Shen, and Jianhua Z. Huang.

SupSFPCA(
  Y,
  X,
  r,
  ind_lam = 1,
  ind_alp = 1,
  ind_gam = 1,
  ind_Omg = 1,
  Omega = 0,
  max_niter = 10^3,
  convg_thres = 10^-6,
  vmax_niter = 10^2,
  vconvg_thres = 10^-4
)

`Y`	n*q (column centered) auxiliary data matrix, rows are samples and columns are variables
`X`	n*p (column centered) primary data matrix, which we want to decompose. rows are samples (matched with Y) and columns are variables
`r`	positive scalar, prespecified rank (r should be smaller than n and p)
`ind_lam`	0 or 1 (default=1, sparse loading), sparsity index for loadings
`ind_alp`	0 or 1 (default=1, smooth loading), smoothness index for loadings
`ind_gam`	0 or 1 (default=1, sparse coefficient), sparsity index for supervision coefficients. Note: if gamma is set to be 0, Y must have q<n to avoid overfitting; if gamma is set to be 1, then it can handle high dimensional supervision Y
`ind_Omg`	p*p symmetric positive semi-definite matrix for smoothness penalty (default is for evenly spaced data) Note: only change this if you have unevenly spaced functional data X
`Omega`	??
`max_niter`	scalar (default=1E3), max number of overall iterations
`convg_thres`	positive scalar (default=1E-6), overall convergence threshold
`vmax_niter`	scalar (default=1E2), max number of iterations for estimating each loading vector
`vconvg_thres`	positive scalar (default=1E-4), convergence threshold for the proximal gradient descent algorithm for estimating each loading vector

list with components

`B:`	q*r coefficient matrix of Y on the scores of X,maybe sparse if gamma=1
`V:`	p*r loading matrix of X, each column has norm 1, but no strict orthogonality because of sparsity and smoothness. If lambda=1, V is sparse; if alpha=1, each column of V is smooth
`U:`	n*r score matrix of X, conditional expectation of random scores, no strict orthogonality
`se2:`	scalar, variance of measurement error in the primary data X
`Sf:`	r*r diagonal covariance matrix, for random effects (see paper)

Note: Essentially, U and V are the most important output for dimension reduction purpose as in PCA or SVD.

## Not run: 
library(spls)
data(yeast)
r <- 4
ydata <- as.data.frame(yeast[1])
xdata <- as.data.frame(yeast[2])
yc <- scale(ydata,center = TRUE,scale=FALSE)
xc <- scale(xdata,center=TRUE,scale=FALSE)
SupSFPCA(yc,xc,r)

## End(Not run)