cPCA: Contrastive PCA (cPCA) with Adaptive Computation Methods

View source: R/cPCA.R

cPCAR Documentation

Contrastive PCA (cPCA) with Adaptive Computation Methods

Description

Contrastive PCA (cPCA) finds directions that capture the variation in a "foreground" dataset X_f that is not present (or less present) in a "background" dataset X_b. This function adaptively chooses how to solve the generalized eigenvalue problem based on the dataset sizes and the chosen method:

Usage

cPCA(
  X_f,
  X_b,
  ncomp = min(dim(X_f)[2]),
  preproc = center(),
  lambda = 0,
  method = c("geigen", "primme", "sdiag", "corpcor"),
  allow_transpose = TRUE,
  ...
)

Arguments

X_f

A numeric matrix representing the foreground dataset, with dimensions (samples x features).

X_b

A numeric matrix representing the background dataset, with dimensions (samples x features).

ncomp

Number of components to estimate. Defaults to min(ncol(X_f)).

preproc

A pre-processing function (default: center()), applied to both X_f and X_b before analysis.

lambda

Shrinkage parameter for covariance estimation. Defaults to 0. Used by corpcor::cov.shrink or crossprod.powcor.shrink.

method

A character string specifying the computation method. One of:

"geigen"

Use geneig for the generalized eigenvalue problem (default).

"primme"

Use geneig with the PRIMME library for potentially more efficient solvers.

"sdiag"

Use a spectral decomposition method for symmetric matrices in geneig.

"corpcor"

Use a corpcor-based whitening approach followed by PCA.

...

Additional arguments passed to underlying functions such as geneig or covariance estimation.

Details

  1. method = "corpcor": Uses a corpcor-based whitening approach (crossprod.powcor.shrink) to transform the data, then performs a standard PCA on the transformed foreground data.

  2. method \in {"geigen","primme","sdiag"} and moderate number of features (D): Directly forms covariance matrices and uses geneig to solve the generalized eigenvalue problem.

  3. method \in {"geigen","primme","sdiag"} and large number of features (D >> N): Uses an SVD-based reduction on the background data to avoid forming large D \times D matrices. This reduces the problem to N \times N space.

Adaptive Strategy:

  • If method = "corpcor", no large covariance matrices are formed. Instead, the background data is used to "whiten" the foreground, followed by a simple PCA.

  • If ⁠method \neq "corpcor"⁠ and the number of features D is manageable (e.g. D <= max(N_f, N_b)), the function forms covariance matrices and directly solves the generalized eigenproblem.

  • If ⁠method \neq "corpcor"⁠ and D is large (e.g., tens of thousands, D > max(N_f, N_b)), it computes the SVD of the background data X_b to derive a smaller ⁠N x N⁠ eigenproblem, thereby avoiding the costly computation of D \times D covariance matrices.

Note: If lambda != 0 and D is very large, the current implementation does not fully integrate shrinkage into the large-D SVD-based approach and will issue a warning.

Value

A bi_projector object containing:

v

A (features x ncomp) matrix of eigenvectors (loadings).

s

A (samples x ncomp) matrix of scores, i.e., projections of X_f onto the eigenvectors.

sdev

A vector of length ncomp giving the square-root of the eigenvalues.

preproc

The pre-processing object used.

Examples

set.seed(123)
X_f <- matrix(rnorm(2000), nrow=100, ncol=20) # Foreground: 100 samples, 20 features
X_b <- matrix(rnorm(2000), nrow=100, ncol=20) # Background: same size
# Default method (geigen), small dimension scenario
res <- cPCA(X_f, X_b, ncomp=5)
plot(res$s[,1], res$s[,2], main="cPCA scores (component 1 vs 2)")


bbuchsbaum/multivarious documentation built on Dec. 23, 2024, 7:47 a.m.