ComDim_y: ComDim_y - Extending PLS-like supervised methods to...

View source: R/ComDim_y.R

ComDim_yR Documentation

ComDim_y - Extending PLS-like supervised methods to multi-block data.

Description

Extends any PLS-like method used for regression or discriminant purposes to the multi-block field. The user provides a function (FUN) that computes one predictive component from the salience-weighted concatenated blocks; global scores, local scores, and loadings are then derived following the traditional ComDim-PLS framework. Optionally, orthogonal components returned by FUN (e.g. from an O-PLS wrapper) are captured. VIP scores and k-fold cross-validation are also supported.

Usage

ComDim_y(
  MB = MB,
  y = y,
  ndim = NULL,
  FUN = FUN,
  nort = 0L,
  type = c("regression", "discriminant")[1],
  decisionRule = c("fixed", "max")[2],
  normalise = FALSE,
  scale.y = FALSE,
  threshold = 1e-10,
  loquace = FALSE,
  method = "FUN",
  cv.k = 7,
  ...
)

Arguments

MB

A MultiBlock object.

y

The response: a numeric vector or matrix for regression (type = 'regression'), or a class vector or dummy matrix for discriminant analysis (type = 'discriminant').

ndim

Number of predictive Common Dimensions. If NULL, defaults to the number of blocks.

FUN

The function used as the core of the ComDim analysis. It must accept W (salience-weighted concatenated blocks, n x p matrix), y (response), and ndim (number of components to compute) as its first three named arguments, and return a named list with at least:

scores

X scores vector (length n) for the current component.

P

X loadings vector (length p) for the current component.

W

X weights vector (length p) for the current component.

Q

Y loadings vector for the current component.

U

Y scores vector (length n) for the current component.

Optional return fields:

y

y as used internally by FUN (e.g. after centring/scaling), so that ComDim_y can detect automatic y-transformation and back-transform the Q loadings for correct B and B0 computation.

orthoscores

A numeric vector (length n) or single-column matrix of orthogonal X-scores. Required when nort = 1. Only the first column is used. FUN does not need to change its behaviour between ort and predictive phases; if it always returns orthoscores, the framework uses it only during the ort phase and ignores it during the predictive phase.

nort

Number of orthogonal Common Dimensions to extract before the predictive loop. Default 0 (no orthogonalization, pure PLS-like). Only nort = 0 or nort = 1 are accepted; for multiple orthogonal components use ComDim_OPLS(). When nort = 1 the function follows the same two-phase architecture as ComDim_OPLS: the orthogonal component is removed from the (lambda-weighted) concatenated blocks first, then predictive components are computed from the orthogonally-deflated blocks. FUN must return output$orthoscores (a numeric vector of length n, or a matrix whose first column is used) when called with ndim = 1. Cross-validation is automatically skipped when nort > 0.

type

'regression' (default) or 'discriminant'.

decisionRule

Only used when type = 'discriminant'. If 'fixed', samples are assigned to the class whose predicted score exceeds 1/nclasses; if 'max', the class with the highest predicted score is chosen. Default 'max'.

normalise

To apply block normalisation. FALSE == no (default), TRUE == yes.

scale.y

Logical (default FALSE). When TRUE and type = 'regression', each column of Y is mean-centred and scaled to unit variance before being passed to FUN. The stored B, B0, and Ypred are back-transformed to the original Y scale, so downstream outputs (R2Y, Q2, predictions) are always in the original units. Ignored when type = 'discriminant' (dummy Y should not be scaled).

threshold

Convergence threshold: iterations stop when the change in the global score vector falls below this value (default 1e-10).

loquace

Display computation time at each step. TRUE == yes, FALSE == no (default).

method

A string label identifying the method (default: 'FUN').

cv.k

Number of folds for k-fold cross-validation (default 7). Set to 0 to skip CV. When cv.k >= 2, the Q2 and DQ2 slots in the output reflect cross-validated predictive ability; otherwise they reflect training-set fit. CV is skipped when nort > 0.

...

Additional arguments passed to FUN.

Value

A ComDim object with the following slots:

Method

The label supplied via the method argument.

ndim

Number of predictive Common Dimensions extracted.

Q.scores

Global consensus scores matrix (n \times ndim). Each column \mathbf{q}_a (unit-norm) is derived from the dominant left direction of FUN applied to the salience-weighted concatenated blocks.

T.scores

Named list of block-specific local scores (n \times ndim each). Local loading \mathbf{p}_{ba} = \tilde{\mathbf{X}}_b'\mathbf{q}_a (computed on the ort-deflated block when nort > 0); local score \mathbf{t}_{ba} = \tilde{\mathbf{X}}_b\,\mathbf{p}_{ba}(\mathbf{p}_{ba}'\mathbf{p}_{ba})^{-1}.

P.loadings

Global loadings (p_{tot} \times ndim): \mathbf{P} = \tilde{\mathbf{X}}'\mathbf{Q}, where \tilde{\mathbf{X}} is the (optionally ort-deflated) mean-centred concatenated blocks.

Saliences

Block salience matrix (ntable \times ndim): \lambda_{ba} = \mathbf{q}_a'\tilde{\mathbf{X}}_b\tilde{\mathbf{X}}_b'\mathbf{q}_a.

R2X

Proportion of X variance captured by each predictive component (named vector, length ndim). Let \mathbf{t}_a be the X-score vector returned by FUN for component a:

R2X_a = \|\mathbf{t}_a\|^4 \big/ \sum_k \|\mathbf{t}_k\|^4.

When nort > 0, the denominator also includes the orthogonal \|\mathbf{t}_{ort,k}\|^4 terms, and the orthogonal R2X fractions are stored separately in Orthogonal$R2X.

R2Y

Cumulative Y-variance explained (named vector, length ndim):

R2Y_a = 1 - RSS_a / TSS_Y,

where RSS_a is the residual SS from an OLS regression of \mathbf{Y} on [1, \mathbf{q}_1, \ldots, \mathbf{q}_a]. Note: R2Y_a is cumulative – the total Y-variance explained by the first a components together, not the marginal contribution of component a alone.

Q2

Predictive Q2 per response column (regression) or per class (discriminant), named accordingly:

Q2 = 1 - PRESS / TSS_Y,

where PRESS = \sum_i (\hat{y}_i - y_i)^2. When cv.k >= 2 and nort = 0: cross-validated (out-of- sample) predictions are used; otherwise training-set predictions. CV is automatically skipped when nort > 0.

DQ2

(Discriminant mode only) Discriminant Q2 per class, using only penalising residuals:

DQ2 = 1 - PRESSD / TSS_Y,

where PRESSD sums \hat{y}_i^2 for class-0 samples with \hat{y}_i > 0, and (\hat{y}_i - 1)^2 for class-1 samples with \hat{y}_i < 1. Same cross-validation logic as Q2.

Singular

Squared L2 norm of the FUN X-score vector per component (\|\mathbf{t}_a\|^2), used to derive R2X.

VIP

Global total VIP (named vector, length p_{tot}): concatenation of VIP.block[[b]]$tot across blocks. When nort = 0, uses the Wold formula; when nort = 1, tot combines predictive and orthogonal VIPs (see VIP.block).

VIP.block

Named list (one data.frame per block). When nort = 0: columns p and tot (= p), using the Wold formula:

VIPp_j = \sqrt{p_b \cdot \frac{\sum_a s_a \tilde{w}_{j,a}^2}{\sum_a s_a}},

where s_a = \|\mathbf{t}_a\|^2\|\mathbf{q}_a\|^2 and \tilde{w}_{j,a} = w_{j,a}/\|\mathbf{w}_a\| is the L2-normalised j-th element of the a-th weight vector. When nort = 1: columns p (Wold, same as above), o (orthogonal VIP, loadings-based: VIPo_j = \sqrt{p_b \cdot \sum_a s_{oa}\tilde{P}_{o,j,a}^2 / \sum_a s_{oa}}, where s_{oa} = \|\mathbf{q}_{ort}[,a]\|^2 and \tilde{\mathbf{P}}_o is the column-L2-normalised block-slice of the ort loadings), and tot (VIPtot_j = \sqrt{(VIPp_j^2 + VIPo_j^2)/2}). Row names are variable names.

PLS.model

List with: W (X weight matrix collected from FUN, p_{tot} \times ndim); B (regression coefficients, \mathbf{B} = \mathbf{W}(\mathbf{P}'\mathbf{W})^{-1}\mathbf{Q}', in original Y units); B0 (intercept, \mathbf{B}_0 = \bar{\mathbf{y}} - \overline{\tilde{\mathbf{x}}}\mathbf{B}); Y (original response matrix as supplied). Training-set predictions: \hat{\mathbf{Y}} = \tilde{\mathbf{X}}\mathbf{B} + \mathbf{B}_0.

cv

Cross-validation results when cv.k >= 2 and nort = 0 (empty list otherwise): k, fold (sample-to-fold vector), Ypred (n \times ncol(Y) out-of-sample predictions), Q2 (CV Q2 per class/response), DQ2 (mean CV DQ2, discriminant only), DQ2.perclass (CV DQ2 per class, discriminant only).

Orthogonal

When nort > 0: list with nort, Q.scores (global ort scores, n \times nort, unit-norm), T.scores (block ort local scores, n \times nort each), P.loadings.ort (ort loadings, p_{tot} \times nort), Saliences.ort (ntable \times nort), and R2X (orthogonal X-variance fractions, R2X_{ort,a} = \|\mathbf{t}_{ort,a}\|^4 / total). Empty list when nort = 0.

Prediction

Training-set predictions: Y.pred (n \times ncol(Y)); for discriminant analysis also decisionRule, trueClass, predClass (data.frame), Sensitivity and Specificity (per class), confusionMatrix (named list of 2x2 matrices).

Mean

List with MeanMB (column means per block), MeanY (column means of Y), and ScaleY (column SDs of Y; all ones when scale.y = FALSE).

Norm

List with NormMB: Frobenius norms for block normalisation.

variable.block

Character vector (length p_{tot}) mapping each row of P.loadings and each element of VIP to its block.

runtime

Total computation time in seconds.

Examples

b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables
b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables
mb <- MultiBlock(Data = list(b1 = b1, b2 = b2))

## Example 1: ComDim-PLS (regression) ---------------------------------------
# Single-step NIPALS PLS wrapper (one predictive component per call).
# Note: 'tx' is used instead of 't' to avoid shadowing base::t().
fun.PLS <- function(W, y, ndim, ...) {
  output <- list()
  w <- t(W) %*% y / as.numeric(t(y) %*% y) # X weight (u = y, 1 step)
  w <- w / sqrt(sum(w^2)) # L2 normalise
  tx <- W %*% w # X score
  p <- t(W) %*% tx / as.numeric(t(tx) %*% tx) # X loading
  q <- t(y) %*% tx / as.numeric(t(tx) %*% tx) # Y loading
  u <- y %*% q / as.numeric(t(q) %*% q) # Y score
  output$scores <- as.vector(tx)
  output$P <- as.vector(p)
  output$W <- as.vector(w)
  output$Q <- as.vector(q)
  output$U <- as.vector(u)
  return(output)
}

y <- c(1, 1, 1, 1, 1, 5, 5, 5, 10, 10)
resultsPLS <- ComDim_y(mb,
  y = y, ndim = 2,
  type = "regression",
  FUN = fun.PLS,
  method = "PLS",
  cv.k = 0
)

## Example 2: ComDim-OPLS-DA (discriminant, nort = 1) ----------------------
# Thin wrapper around OPLS_NIPALS_DNR(), the package's NIPALS OPLS engine.
# All inputs (W, y, and any extra args such as 'threshold') are forwarded
# directly via '...'. Use this pattern when nort > 0; for nort = 0 the
# simpler PLS wrapper in Example 1 is sufficient (no orthoscores needed).
fun.OPLS <- function(W, y, ndim, ...) {
  res <- OPLS_NIPALS_DNR(W = W, y = y, ...)
  list(
    scores      = as.vector(res$t_pred),
    P           = as.vector(res$p),
    W           = as.vector(res$w_pred),
    Q           = as.vector(res$q),
    U           = as.vector(res$u),
    orthoscores = matrix(res$t_ort, ncol = 1)
  )
}

groups <- c(rep("A", 5), rep("B", 5))
resultsOPLS <- ComDim_y(mb,
  y = groups, ndim = 1,
  nort = 1,
  type = "discriminant",
  FUN = fun.OPLS,
  method = "OPLS-DA",
  cv.k = 0
)

## Example 3 (not run): ComDim-OPLS-DA via ropls ---------------------------
# Wrapping ropls::opls is also possible. Key points:
#   - Use orthoI = 1 (fixed) instead of NA so the output is predictable.
#   - Always return output$orthoscores; ComDim_y ignores it in phases
#     where ort has already been removed.
#   - Expand the single ropls Q loading to match the ncol(y_dummy) width.

if (requireNamespace("ropls", quietly = TRUE)) {
fun.OPLSDA.ropls <- function(W, y, ndim, ...) {
  output <- list()
  # Convert dummy matrix to ropls-compatible -1/+1 vector
  Y <- c(-1, 1)[apply(y, 1, function(x) match(1, x))]
  result <- tryCatch(
    ropls::opls(
      x = W, y = Y, predI = 1, orthoI = 1,
      fig.pdfC = "none", info.txtC = "none"
    ),
    error = function(e) {
      ropls::opls(
        x = W, y = Y, predI = 1, orthoI = 0,
        fig.pdfC = "none", info.txtC = "none"
      )
    }
  )
  output$scores <- result@scoreMN[, 1]
  output$P <- result@loadingMN[, 1]
  output$W <- result@weightMN[, 1]
  output$U <- result@uMN[, 1]
  # Expand the single ropls Q loading to match the 2-column dummy matrix:
  # loadings for class1 and class2 are antisymmetric in binary PLS-DA.
  output$Q <- c(-result@cMN[, 1], result@cMN[, 1])
  output$y <- result@suppLs$yModelMN # internal y (for scaling detection)
  # Orthogonal scores (used during the ort pre-loop when nort > 0)
  if (!is.null(result@orthoScoreMN) && ncol(result@orthoScoreMN) > 0) {
    output$orthoscores <- result@orthoScoreMN   # n x k matrix; col jj used for jj-th ort
  } else {
    output$orthoscores <- matrix(0, nrow = nrow(W), ncol = 1)
  }
  return(output)
}

b1_r <- matrix(rnorm(8 * 30), 8, 30)
b2_r <- matrix(rnorm(8 * 20), 8, 20)
mb_r <- MultiBlock(Data = list(b1 = b1_r, b2 = b2_r))
resultsOPLSDA <- ComDim_y(mb_r,
  y = c(rep("NI", 4), rep("OFF", 4)),
  ndim = 1,
  nort = 1,
  type = "discriminant",
  FUN = fun.OPLSDA.ropls,
  method = "OPLS-DA(ropls)",
  cv.k = 0
)
}


R.ComDim documentation built on May 13, 2026, 9:07 a.m.