| ComDim_PLS | R Documentation |
Finding common dimensions in multi-block datasets using PLS.
ComDim_PLS(
MB = MB,
y = y,
ndim = NULL,
method = c("PLS-DA", "PLS-R"),
decisionRule = c("fixed", "max")[2],
normalise = FALSE,
scale.y = FALSE,
threshold = 1e-10,
loquace = FALSE,
CompMethod = "Normal",
Partitions = 1,
cv.k = 7
)
MB |
A MultiBlock object. |
y |
The Y-block to use in the PLS model as dependent data. A class vector or a dummy matrix. |
ndim |
Number of Common Dimensions. |
method |
PLS-DA or PLS-R. |
decisionRule |
Only used if method is set to PLS-DA. If 'fixed', samples are assigned to the class with Y-hat above 1/nclasses. If 'max', samples are assigned to the class with the highest Y-hat. |
normalise |
To apply block normalisation. FALSE == no (default), TRUE == yes.
When TRUE each block is mean-centred and then divided by its Frobenius norm so
that all blocks have unit total inertia entering the ComDim loop.
Has no effect on the Y-block; use |
scale.y |
Logical (default FALSE). When TRUE and |
threshold |
The threshold limit to stop the iterations. If the "difference of fit" < threshold (1e-10 as default). |
loquace |
To display the calculation times. TRUE == yes, FALSE == no (default). |
CompMethod |
To speed up the analysis for really big multi-blocks. 'Normal' (default), 'Kernel', 'PCT', 'Tall' or 'Wide'. |
Partitions |
To speed up the analysis for really big multi-blocks. This parameter is used if CompMethod is 'Tall' or 'Wide'. |
cv.k |
Number of folds for k-fold cross-validation (default 7). Set to 0 to skip CV. When cv.k >= 2, Q2 and DQ2 in the output reflect cross-validated predictive ability; otherwise they reflect training-set fit (R2). |
A ComDim object. All slots are populated. Key slots:
Method"PLS-DA" or "PLS-R".
ndimNumber of Common Dimensions extracted.
Q.scoresGlobal consensus PLS scores (n \times ndim).
Each column \mathbf{q}_a (unit-norm) is the dominant left
singular vector from the NIPALS PLS applied to the salience-weighted
concatenated blocks.
T.scoresNamed list of block-specific local scores
(n \times ndim each). Local loading
\mathbf{p}_{ba} = \mathbf{X}_b'\mathbf{q}_a; local score
\mathbf{t}_{ba} =
\mathbf{X}_b\,\mathbf{p}_{ba}(\mathbf{p}_{ba}'\mathbf{p}_{ba})^{-1}.
P.loadingsGlobal loadings (p_{tot} \times ndim):
\mathbf{P} = \mathbf{X}'\mathbf{Q}, where \mathbf{X} is
the mean-centred (and optionally normalised) concatenated blocks.
SaliencesBlock salience matrix (ntable \times ndim):
\lambda_{ba} =
\mathbf{q}_a'\mathbf{X}_b\mathbf{X}_b'\mathbf{q}_a.
R2XProportion of X variance captured by each component
(named vector, length ndim). Let \mathbf{t}_a be the
NIPALS PLS X-score vector for component a on the salience-
weighted blocks; then
R2X_a = \|\mathbf{t}_a\|^4 \big/ \sum_k \|\mathbf{t}_k\|^4.
R2YCumulative Y-variance explained (named vector, length
ndim). R2Y_a is the R^2 from an OLS regression of
\mathbf{Y} on the first a global scores with an intercept:
R2Y_a = 1 - RSS_a / TSS_Y,
where RSS_a is the residual SS when predicting \mathbf{Y}
from [1, \mathbf{q}_1, \ldots, \mathbf{q}_a].
Note: R2Y_a is cumulative — it reflects the total
Y-variance explained by the first a components together,
not the marginal contribution of component a alone.
Q2Predictive Q2 per response column (PLS-R) or per class (PLS-DA), named accordingly:
Q2 = 1 - PRESS / TSS_Y,
where PRESS = \sum_i (\hat{y}_i - y_i)^2 and
TSS_Y = \sum_i (y_i - \bar{y})^2.
When cv.k >= 2, \hat{y}_i are out-of-sample
cross-validated predictions; otherwise training-set predictions are
used (i.e. Q2 = R2Y for the full model).
DQ2(PLS-DA only) Discriminant Q2 per class. Only penalising residuals contribute to the sum:
DQ2 = 1 - PRESSD / TSS_Y,
where PRESSD sums \hat{y}_i^2 for class-0 samples with
\hat{y}_i > 0, and (\hat{y}_i - 1)^2 for class-1 samples
with \hat{y}_i < 1. Same cross-validation logic as Q2.
SingularSquared L2 norm of the NIPALS PLS score vector
per component (\|\mathbf{t}_a\|^2), used to derive
R2X.
VIPGlobal VIP scores (named numeric vector, length
p_{tot}) using the Wold formula:
VIP_j = \sqrt{p_{tot}
\cdot \frac{\sum_a s_a \tilde{w}_{j,a}^2}{\sum_a s_a}},
where s_a = \|\mathbf{t}_a\|^2 \|\mathbf{q}_a\|^2,
\tilde{w}_{j,a} = w_{j,a} / \|\mathbf{w}_a\| is the
L2-normalised j-th element of the a-th NIPALS weight
vector, and p_{tot} is the total number of variables.
VIP.blockNamed list (one data.frame per block)
with columns p (per-block predictive VIP computed with block
size p_b instead of p_{tot}) and tot (= p
for PLS; included for consistency with OPLS output). Row names are
variable names.
PLS.modelList with: W (NIPALS X weight matrix,
p_{tot} \times ndim); B (regression coefficients,
p_{tot} \times ncol(Y),
\mathbf{B} =
\mathbf{W}(\mathbf{P}'\mathbf{W})^{-1}\mathbf{Q}',
in original Y units); B0 (intercept vector, length
ncol(Y),
\mathbf{B}_0 = \bar{\mathbf{y}} - \bar{\mathbf{x}}\mathbf{B});
Y (original response matrix as supplied).
Training-set Y predictions:
\hat{\mathbf{Y}} = \mathbf{X}\mathbf{B} + \mathbf{B}_0.
cvCross-validation results when cv.k >= 2 (empty
list otherwise): k (number of folds), fold
(sample-to-fold vector), Ypred (n \times ncol(Y) matrix
of out-of-sample predictions), Q2 (CV Q2 per class/response),
DQ2 (mean CV DQ2 across classes, PLS-DA only),
DQ2.perclass (CV DQ2 per class, PLS-DA only).
PredictionTraining-set predictions: Y.pred
(n \times ncol(Y)); for PLS-DA also decisionRule,
trueClass (character vector), predClass (data.frame),
Sensitivity and Specificity (named per class),
confusionMatrix (named list of 2x2 matrices, one per class).
MeanList with MeanMB (column means per block),
MeanY (column means of Y before any scaling), and
ScaleY (column SDs of Y; all ones when scale.y = FALSE).
NormList with NormMB: Frobenius norms for block
normalisation.
variable.blockCharacter vector (length p_{tot})
mapping each row of P.loadings and each element of VIP
to its block.
runtimeTotal computation time in seconds.
Jouan-Rimbaud Bouveresse D, Rutledge DN (2024). A synthetic review of some recent extensions of ComDim. Journal of Chemometrics, 38(5), e3454. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/cem.3454")}
b1 <- matrix(rnorm(500), 10, 50)
batch_b1 <- rep(1, 10)
b2 <- matrix(rnorm(2400), 30, 80)
batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10))
mb <- MultiBlock(
Samples = list(
b1 = paste0("samples_", 1:10),
b2 = rep(paste0("samples_", 1:10), 3)
),
Data = list(b1 = b1, b2 = b2),
Batch = list(b1 = batch_b1, b2 = batch_b2),
ignore.size = TRUE
)
rw <- SplitRW(mb)
y <- scale(1:10, center = TRUE)
results.plsr <- ComDim_PLS(rw, y, 2, method = "PLS-R")
groups <- c(rep("A", 5), rep("B", 5))
results.plsda <- ComDim_PLS(rw, y = groups, 2, method = "PLS-DA")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.