ComDim_PLS: ComDim_PLS

View source: R/ComDim_PLS.R

ComDim_PLSR Documentation

ComDim_PLS

Description

Finding common dimensions in multi-block datasets using PLS.

Usage

ComDim_PLS(
  MB = MB,
  y = y,
  ndim = NULL,
  method = c("PLS-DA", "PLS-R"),
  decisionRule = c("fixed", "max")[2],
  normalise = FALSE,
  scale.y = FALSE,
  threshold = 1e-10,
  loquace = FALSE,
  CompMethod = "Normal",
  Partitions = 1,
  cv.k = 7
)

Arguments

MB

A MultiBlock object.

y

The Y-block to use in the PLS model as dependent data. A class vector or a dummy matrix.

ndim

Number of Common Dimensions.

method

PLS-DA or PLS-R.

decisionRule

Only used if method is set to PLS-DA. If 'fixed', samples are assigned to the class with Y-hat above 1/nclasses. If 'max', samples are assigned to the class with the highest Y-hat.

normalise

To apply block normalisation. FALSE == no (default), TRUE == yes. When TRUE each block is mean-centred and then divided by its Frobenius norm so that all blocks have unit total inertia entering the ComDim loop. Has no effect on the Y-block; use scale.y for that.

scale.y

Logical (default FALSE). When TRUE and method = 'PLS-R', each column of Y is mean-centred and scaled to unit variance before the PLS fit. The stored B, B0, and Ypred are back-transformed to the original Y scale, so downstream outputs (R2Y, Q2, predictions) are always in the original units. Ignored when method = 'PLS-DA' (dummy Y should not be scaled).

threshold

The threshold limit to stop the iterations. If the "difference of fit" < threshold (1e-10 as default).

loquace

To display the calculation times. TRUE == yes, FALSE == no (default).

CompMethod

To speed up the analysis for really big multi-blocks. 'Normal' (default), 'Kernel', 'PCT', 'Tall' or 'Wide'.

Partitions

To speed up the analysis for really big multi-blocks. This parameter is used if CompMethod is 'Tall' or 'Wide'.

cv.k

Number of folds for k-fold cross-validation (default 7). Set to 0 to skip CV. When cv.k >= 2, Q2 and DQ2 in the output reflect cross-validated predictive ability; otherwise they reflect training-set fit (R2).

Value

A ComDim object. All slots are populated. Key slots:

Method

"PLS-DA" or "PLS-R".

ndim

Number of Common Dimensions extracted.

Q.scores

Global consensus PLS scores (n \times ndim). Each column \mathbf{q}_a (unit-norm) is the dominant left singular vector from the NIPALS PLS applied to the salience-weighted concatenated blocks.

T.scores

Named list of block-specific local scores (n \times ndim each). Local loading \mathbf{p}_{ba} = \mathbf{X}_b'\mathbf{q}_a; local score \mathbf{t}_{ba} = \mathbf{X}_b\,\mathbf{p}_{ba}(\mathbf{p}_{ba}'\mathbf{p}_{ba})^{-1}.

P.loadings

Global loadings (p_{tot} \times ndim): \mathbf{P} = \mathbf{X}'\mathbf{Q}, where \mathbf{X} is the mean-centred (and optionally normalised) concatenated blocks.

Saliences

Block salience matrix (ntable \times ndim): \lambda_{ba} = \mathbf{q}_a'\mathbf{X}_b\mathbf{X}_b'\mathbf{q}_a.

R2X

Proportion of X variance captured by each component (named vector, length ndim). Let \mathbf{t}_a be the NIPALS PLS X-score vector for component a on the salience- weighted blocks; then

R2X_a = \|\mathbf{t}_a\|^4 \big/ \sum_k \|\mathbf{t}_k\|^4.

R2Y

Cumulative Y-variance explained (named vector, length ndim). R2Y_a is the R^2 from an OLS regression of \mathbf{Y} on the first a global scores with an intercept:

R2Y_a = 1 - RSS_a / TSS_Y,

where RSS_a is the residual SS when predicting \mathbf{Y} from [1, \mathbf{q}_1, \ldots, \mathbf{q}_a]. Note: R2Y_a is cumulative — it reflects the total Y-variance explained by the first a components together, not the marginal contribution of component a alone.

Q2

Predictive Q2 per response column (PLS-R) or per class (PLS-DA), named accordingly:

Q2 = 1 - PRESS / TSS_Y,

where PRESS = \sum_i (\hat{y}_i - y_i)^2 and TSS_Y = \sum_i (y_i - \bar{y})^2. When cv.k >= 2, \hat{y}_i are out-of-sample cross-validated predictions; otherwise training-set predictions are used (i.e. Q2 = R2Y for the full model).

DQ2

(PLS-DA only) Discriminant Q2 per class. Only penalising residuals contribute to the sum:

DQ2 = 1 - PRESSD / TSS_Y,

where PRESSD sums \hat{y}_i^2 for class-0 samples with \hat{y}_i > 0, and (\hat{y}_i - 1)^2 for class-1 samples with \hat{y}_i < 1. Same cross-validation logic as Q2.

Singular

Squared L2 norm of the NIPALS PLS score vector per component (\|\mathbf{t}_a\|^2), used to derive R2X.

VIP

Global VIP scores (named numeric vector, length p_{tot}) using the Wold formula:

VIP_j = \sqrt{p_{tot} \cdot \frac{\sum_a s_a \tilde{w}_{j,a}^2}{\sum_a s_a}},

where s_a = \|\mathbf{t}_a\|^2 \|\mathbf{q}_a\|^2, \tilde{w}_{j,a} = w_{j,a} / \|\mathbf{w}_a\| is the L2-normalised j-th element of the a-th NIPALS weight vector, and p_{tot} is the total number of variables.

VIP.block

Named list (one data.frame per block) with columns p (per-block predictive VIP computed with block size p_b instead of p_{tot}) and tot (= p for PLS; included for consistency with OPLS output). Row names are variable names.

PLS.model

List with: W (NIPALS X weight matrix, p_{tot} \times ndim); B (regression coefficients, p_{tot} \times ncol(Y), \mathbf{B} = \mathbf{W}(\mathbf{P}'\mathbf{W})^{-1}\mathbf{Q}', in original Y units); B0 (intercept vector, length ncol(Y), \mathbf{B}_0 = \bar{\mathbf{y}} - \bar{\mathbf{x}}\mathbf{B}); Y (original response matrix as supplied). Training-set Y predictions: \hat{\mathbf{Y}} = \mathbf{X}\mathbf{B} + \mathbf{B}_0.

cv

Cross-validation results when cv.k >= 2 (empty list otherwise): k (number of folds), fold (sample-to-fold vector), Ypred (n \times ncol(Y) matrix of out-of-sample predictions), Q2 (CV Q2 per class/response), DQ2 (mean CV DQ2 across classes, PLS-DA only), DQ2.perclass (CV DQ2 per class, PLS-DA only).

Prediction

Training-set predictions: Y.pred (n \times ncol(Y)); for PLS-DA also decisionRule, trueClass (character vector), predClass (data.frame), Sensitivity and Specificity (named per class), confusionMatrix (named list of 2x2 matrices, one per class).

Mean

List with MeanMB (column means per block), MeanY (column means of Y before any scaling), and ScaleY (column SDs of Y; all ones when scale.y = FALSE).

Norm

List with NormMB: Frobenius norms for block normalisation.

variable.block

Character vector (length p_{tot}) mapping each row of P.loadings and each element of VIP to its block.

runtime

Total computation time in seconds.

References

Jouan-Rimbaud Bouveresse D, Rutledge DN (2024). A synthetic review of some recent extensions of ComDim. Journal of Chemometrics, 38(5), e3454. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/cem.3454")}

Examples

b1 <- matrix(rnorm(500), 10, 50)
batch_b1 <- rep(1, 10)
b2 <- matrix(rnorm(2400), 30, 80)
batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10))
mb <- MultiBlock(
  Samples = list(
    b1 = paste0("samples_", 1:10),
    b2 = rep(paste0("samples_", 1:10), 3)
  ),
  Data = list(b1 = b1, b2 = b2),
  Batch = list(b1 = batch_b1, b2 = batch_b2),
  ignore.size = TRUE
)
rw <- SplitRW(mb)
y <- scale(1:10, center = TRUE)
results.plsr <- ComDim_PLS(rw, y, 2, method = "PLS-R")
groups <- c(rep("A", 5), rep("B", 5))
results.plsda <- ComDim_PLS(rw, y = groups, 2, method = "PLS-DA")

R.ComDim documentation built on May 13, 2026, 9:07 a.m.