cPCA | R Documentation |
Contrastive PCA (cPCA) finds directions that capture the variation in a "foreground" dataset X_f
that is not present (or less present) in a "background" dataset X_b
. This function adaptively chooses how to solve the generalized eigenvalue problem based on the dataset sizes and the chosen method:
cPCA(
X_f,
X_b,
ncomp = min(dim(X_f)[2]),
preproc = center(),
lambda = 0,
method = c("geigen", "primme", "sdiag", "corpcor"),
allow_transpose = TRUE,
...
)
X_f |
A numeric matrix representing the foreground dataset, with dimensions (samples x features). |
X_b |
A numeric matrix representing the background dataset, with dimensions (samples x features). |
ncomp |
Number of components to estimate. Defaults to |
preproc |
A pre-processing function (default: |
lambda |
Shrinkage parameter for covariance estimation. Defaults to 0. Used by |
method |
A character string specifying the computation method. One of:
|
... |
Additional arguments passed to underlying functions such as |
method = "corpcor": Uses a corpcor-based whitening approach (crossprod.powcor.shrink
) to transform the data, then performs a standard PCA on the transformed foreground data.
method \in {"geigen","primme","sdiag"} and moderate number of features (D): Directly forms covariance matrices and uses geneig
to solve the generalized eigenvalue problem.
method \in {"geigen","primme","sdiag"} and large number of features (D >> N): Uses an SVD-based reduction on the background data to avoid forming large D \times D
matrices. This reduces the problem to N \times N
space.
Adaptive Strategy:
If method = "corpcor"
, no large covariance matrices are formed. Instead, the background data is used to "whiten" the foreground, followed by a simple PCA.
If method \neq "corpcor"
and the number of features D
is manageable (e.g. D <= max(N_f, N_b)
), the function forms covariance matrices and directly solves the generalized eigenproblem.
If method \neq "corpcor"
and D
is large (e.g., tens of thousands, D > max(N_f, N_b)
), it computes the SVD of the background data X_b
to derive a smaller N x N
eigenproblem, thereby avoiding the costly computation of D \times D
covariance matrices.
Note: If lambda != 0
and D
is very large, the current implementation does not fully integrate shrinkage into the large-D SVD-based approach and will issue a warning.
A bi_projector
object containing:
A (features x ncomp) matrix of eigenvectors (loadings).
A (samples x ncomp) matrix of scores, i.e., projections of X_f
onto the eigenvectors.
A vector of length ncomp
giving the square-root of the eigenvalues.
The pre-processing object used.
set.seed(123)
X_f <- matrix(rnorm(2000), nrow=100, ncol=20) # Foreground: 100 samples, 20 features
X_b <- matrix(rnorm(2000), nrow=100, ncol=20) # Background: same size
# Default method (geigen), small dimension scenario
res <- cPCA(X_f, X_b, ncomp=5)
plot(res$s[,1], res$s[,2], main="cPCA scores (component 1 vs 2)")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.