cPCAplus | R Documentation |
Contrastive PCA++ (cPCA++) Performs Contrastive PCA++ (cPCA++) to find directions that capture variation enriched in a "foreground" dataset relative to a "background" dataset. This implementation follows the cPCA++ approach which directly solves the generalized eigenvalue problem Rf v = lambda Rb v, where Rf and Rb are the covariance matrices of the foreground and background data, centered using the background mean.
cPCAplus(
X_f,
X_b,
ncomp = NULL,
center_background = TRUE,
lambda = 0,
method = c("geigen", "primme", "sdiag", "corpcor"),
strategy = c("auto", "feature", "sample"),
...
)
X_f |
A numeric matrix representing the foreground dataset (samples x features). |
X_b |
A numeric matrix representing the background dataset (samples x features).
|
ncomp |
Integer. The number of contrastive components to compute. Defaults to |
center_background |
Logical. If TRUE (default), both |
lambda |
Shrinkage intensity for covariance estimation (0 <= lambda <= 1).
Defaults to 0 (no shrinkage). Uses |
method |
A character string specifying the primary computation method. Options include:
|
strategy |
Controls the GEVD approach when
|
... |
Additional arguments passed to the underlying computation functions
( |
Preprocessing: Following the cPCA++ paper, if center_background = TRUE
, both X_f
and X_b
are centered by subtracting the column means calculated only from the background data X_b
.
This is crucial for isolating variance specific to X_f
.
Core Algorithm (methods "geigen", "primme", "sdiag", strategy="feature"):
Center X_f
and X_b
using the mean of X_b
.
Compute potentially shrunk p \times p
covariance matrices Rf
(from centered X_f
) and Rb
(from centered X_b
) using corpcor::cov.shrink
.
Solve the generalized eigenvalue problem Rf v = lambda Rb v
for the top ncomp
eigenvectors v
using geigen::geneig
. These eigenvectors are the contrastive principal components (loadings).
Compute scores by projecting the centered foreground data onto the eigenvectors: S = X_f_centered %*% v
.
Core Algorithm (Large-D / Sample Space Strategy, strategy="sample"):
When p \gg n
, forming p \times p
matrices Rf
and Rb
is infeasible. The "sample" strategy follows cPCA++ §3.2:
Center X_f
and X_b
using the mean of X_b
.
Compute the SVD of centered X_b = Ub Sb Vb^T
(using irlba
for efficiency).
Project centered X_f
into the background's principal subspace: Zf = X_f_centered %*% Vb
.
Form small r \times r
matrices: Rf_small = cov(Zf)
and Rb_small = (1/(n_b-1)) * Sb^2
.
Solve the small r \times r
GEVD: Rf_small w = lambda Rb_small w
using geigen::geneig
.
Lift eigenvectors back to feature space: v = Vb %*% w
.
Compute scores: S = X_f_centered %*% v
.
Alternative Algorithm (method "corpcor"):
Center X_f
and X_b
using the mean of X_b
.
Compute Rb
and its inverse square root Rb_inv_sqrt
.
Whiten the foreground data: X_f_whitened = X_f_centered %*% Rb_inv_sqrt
.
Perform standard PCA (stats::prcomp
) on X_f_whitened
.
The returned v
and s
are the loadings and scores in the whitened space. The loadings are not the generalized eigenvectors v
. A specific class corpcor_pca
is added to signal this.
A bi_projector
-like object with classes c("cPCAplus", "<method_class>", "bi_projector")
containing:
Loadings matrix (features x ncomp). Interpretation depends on method
(see Details).
Scores matrix (samples_f x ncomp).
Vector (length ncomp). Standard deviations (sqrt of generalized eigenvalues for geigen
methods, PCA std devs for corpcor
).
Vector (length ncomp). Generalized eigenvalues (for geigen
methods) or PCA eigenvalues (for corpcor
).
The strategy used ("feature" or "sample") if method was not "corpcor".
The initialized preprocessor
object used.
The computation method used.
The number of components computed.
The number of features.
Salloum, R., Kuo, C. C. J. (2022). cPCA++: An efficient method for contrastive feature learning. Pattern Recognition, 124, 108378. (Algorithm 1)
# Simulate data where foreground has extra variance in first few dimensions
set.seed(123)
n_f <- 100
n_b <- 150
n_features <- 50
# Background: standard normal noise
X_b <- matrix(rnorm(n_b * n_features), nrow=n_b, ncol=n_features)
colnames(X_b) <- paste0("Feat_", 1:n_features)
# Foreground: background noise + extra variance in first 5 features
X_f_signal <- matrix(rnorm(n_f * 5, mean=0, sd=2), nrow=n_f, ncol=5)
X_f_noise <- matrix(rnorm(n_f * (n_features-5)), nrow=n_f, ncol=n_features-5)
X_f <- cbind(X_f_signal, X_f_noise) + matrix(rnorm(n_f * n_features), nrow=n_f, ncol=n_features)
colnames(X_f) <- paste0("Feat_", 1:n_features)
rownames(X_f) <- paste0("SampleF_", 1:n_f)
# Apply cPCA++ (requires geigen and corpcor packages)
# install.packages(c("geigen", "corpcor"))
if (requireNamespace("geigen", quietly = TRUE) && requireNamespace("corpcor", quietly = TRUE)) {
# Assuming helper constructors like bi_projector are available
# library(multivarious)
res_cpca_plus <- cPCAplus(X_f, X_b, ncomp = 5, method = "geigen")
# Scores for the foreground data (samples x components)
print(head(res_cpca_plus$s))
# Loadings (contrastive directions) (features x components)
print(head(res_cpca_plus$v))
# Plot scores
plot(res_cpca_plus$s[, 1], res_cpca_plus$s[, 2],
xlab = "Contrastive Component 1", ylab = "Contrastive Component 2",
main = "cPCA++ Scores (geigen method)")
# Example with corpcor method
res_cpca_corp <- cPCAplus(X_f, X_b, ncomp = 5, method = "corpcor")
print(head(res_cpca_corp$s)) # Scores in whitened space
print(head(res_cpca_corp$v)) # Loadings in whitened space
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.