cPCA: Contrastive PCA (cPCA) with Adaptive Computation Methods
In bbuchsbaum/multivarious: Extensible Data Structures for Multivariate Analysis

View source: R/cPCA.R

cPCA	R Documentation

Contrastive PCA (cPCA) with Adaptive Computation Methods

Description

Contrastive PCA (cPCA) finds directions that capture the variation in a "foreground" dataset X_f that is not present (or less present) in a "background" dataset X_b. This function adaptively chooses how to solve the generalized eigenvalue problem based on the dataset sizes and the chosen method:

Usage

cPCA(
  X_f,
  X_b,
  ncomp = min(dim(X_f)[2]),
  preproc = center(),
  lambda = 0,
  method = c("geigen", "primme", "sdiag", "corpcor"),
  allow_transpose = TRUE,
  ...
)

Arguments

`X_f`	A numeric matrix representing the foreground dataset, with dimensions (samples x features).
`X_b`	A numeric matrix representing the background dataset, with dimensions (samples x features).
`ncomp`	Number of components to estimate. Defaults to `min(ncol(X_f))`.
`preproc`	A pre-processing function (default: `center()`), applied to both `X_f` and `X_b` before analysis.
`lambda`	Shrinkage parameter for covariance estimation. Defaults to 0. Used by `corpcor::cov.shrink` or `crossprod.powcor.shrink`.
`method`	A character string specifying the computation method. One of: "geigen" Use `geneig` for the generalized eigenvalue problem (default). "primme" Use `geneig` with the PRIMME library for potentially more efficient solvers. "sdiag" Use a spectral decomposition method for symmetric matrices in `geneig`. "corpcor" Use a corpcor-based whitening approach followed by PCA.
`...`	Additional arguments passed to underlying functions such as `geneig` or covariance estimation.

Details

method = "corpcor": Uses a corpcor-based whitening approach (crossprod.powcor.shrink) to transform the data, then performs a standard PCA on the transformed foreground data.
method \in {"geigen","primme","sdiag"} and moderate number of features (D): Directly forms covariance matrices and uses geneig to solve the generalized eigenvalue problem.
method \in {"geigen","primme","sdiag"} and large number of features (D >> N): Uses an SVD-based reduction on the background data to avoid forming large D \times D matrices. This reduces the problem to N \times N space.

Adaptive Strategy:

If method = "corpcor", no large covariance matrices are formed. Instead, the background data is used to "whiten" the foreground, followed by a simple PCA.
If ⁠method \neq "corpcor"⁠ and the number of features D is manageable (e.g. D <= max(N_f, N_b)), the function forms covariance matrices and directly solves the generalized eigenproblem.
If ⁠method \neq "corpcor"⁠ and D is large (e.g., tens of thousands, D > max(N_f, N_b)), it computes the SVD of the background data X_b to derive a smaller ⁠N x N⁠ eigenproblem, thereby avoiding the costly computation of D \times D covariance matrices.

Note: If lambda != 0 and D is very large, the current implementation does not fully integrate shrinkage into the large-D SVD-based approach and will issue a warning.

Value

A bi_projector object containing:

v: A (features x ncomp) matrix of eigenvectors (loadings).
s: A (samples x ncomp) matrix of scores, i.e., projections of X_f onto the eigenvectors.
sdev: A vector of length ncomp giving the square-root of the eigenvalues.
preproc: The pre-processing object used.

Examples

set.seed(123)
X_f <- matrix(rnorm(2000), nrow=100, ncol=20) # Foreground: 100 samples, 20 features
X_b <- matrix(rnorm(2000), nrow=100, ncol=20) # Background: same size
# Default method (geigen), small dimension scenario
res <- cPCA(X_f, X_b, ncomp=5)
plot(res$s[,1], res$s[,2], main="cPCA scores (component 1 vs 2)")

bbuchsbaum/multivarious documentation built on Dec. 23, 2024, 7:47 a.m.