Extending multiblock: CCA and glmnet Examples
In multivarious: Extensible Data Structures for Multivariate Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width = 7,
  fig.height = 5
)
# Load necessary packages for the examples
# Ensure 'multiblock' package itself is loaded for testing vignettes,
# e.g., via devtools::load_all() or library(multiblock)
library(tibble)
library(dplyr)
library(stats) # For cancor
library(glmnet) # For glmnet example
library(multivarious)
# Helper k-fold function (replace with package internal if available)
kfold_split <- function(n, k = 5) {
  idx <- sample(rep(1:k, length.out = n))
  lapply(1:k, function(j) list(train = which(idx != j),
                               test  = which(idx == j)))
}

This vignette demonstrates how to wrap existing statistical models within the multiblock framework, specifically using cross_projector for Canonical Correlation Analysis (CCA) and projector for glmnet (penalized regression).

1. Canonical Correlation Analysis with `cross_projector`

1.1 Data Setup

We use the classic Iris dataset and split its features into two blocks:

X-block: Sepal.Length, Sepal.Width
Y-block: Petal.Length, Petal.Width

data(iris)
X <- as.matrix(iris[, 1:2])
Y <- as.matrix(iris[, 3:4])

# Show first few rows of combined data
head(cbind(X, Y))

1.2 Wrap `stats::cancor()` into a `cross_projector`

We create a fitting function that runs standard CCA using stats::cancor() and then stores the resulting coefficients (loadings) and preprocessing steps in a cross_projector object.

fit_cca <- function(Xtrain, Ytrain, ncomp = 2, ...) {
  # Step 1: Define and fit preprocessors with training data
  # Use center()+z-score for both blocks (swap to center() only if preferred)
  preproc_x <- fit(colscale(center(), type = "z"), Xtrain)
  preproc_y <- fit(colscale(center(), type = "z"), Ytrain)

  # Step 2: Transform training data with the fitted preprocessors
  Xp <- transform(preproc_x, Xtrain)
  Yp <- transform(preproc_y, Ytrain)

  # Step 3: Run CCA on the preprocessed data (no additional centering here)
  cc <- stats::cancor(Xp, Yp, xcenter = FALSE, ycenter = FALSE)

  # Step 4: Store loadings and preprocessors in a cross_projector
  cp <- cross_projector(
    vx         = cc$xcoef[, 1:ncomp, drop = FALSE],
    vy         = cc$ycoef[, 1:ncomp, drop = FALSE],
    preproc_x  = preproc_x,
    preproc_y  = preproc_y,
    classes    = "cca_cross_projector"
  )
  attr(cp, "can_cor") <- cc$cor[1:ncomp]
  cp
}

# Fit the model
cp_cca <- fit_cca(X, Y)
print(cp_cca)
attr(cp_cca, "can_cor") # Show canonical correlations

What does this cross_projector enable?

Projection: Map new X or Y data into the shared latent space defined by CCA (project(cp_cca, newX, from = "X")).
Transfer/Prediction: Predict the Y block from the X block, or vice-versa (transfer(cp_cca, newX, from = "X", to = "Y")).
Partial Features: Project or transfer using only a subset of features from X or Y (partial_project).
Integration: Use standardized multiblock tools like cross-validation (cv) and permutation testing (perm_test).

1.3 Bidirectional Prediction (Transfer)

Let's predict the Y-block (Petal measurements) using the X-block (Sepal measurements).

Y_hat <- transfer(cp_cca, X, from = "X", to = "Y")
head(round(Y_hat, 2))

measure_reconstruction_error(Y, Y_hat, metrics = c("rmse", "r2"))

1.4 Using Partial Features

What if we only have Sepal.Length (the first column of X) at prediction time? We can still project into the latent space:

new_x_partial <- X[, 1, drop = FALSE]

scores_partial <- partial_project(cp_cca, new_x_partial,
                                  colind = 1,
                                  source = "X")
head(round(scores_partial, 3))

These scores represent the best estimate of position in the latent space given only partial input. This is useful for missing data scenarios or when only some measurements are available.

1.5 Cross-validated Component Selection

Cross-validation for two-block models requires wrapper functions that handle both X and Y blocks. The pattern is:

set.seed(1)
folds <- kfold_split(nrow(X), k = 5)

cv_fit <- function(train_data, ncomp) {
  fit_cca(train_data$X, train_data$Y, ncomp = ncomp)
}

cv_measure <- function(model, test_data) {
  measure_interblock_transfer_error(
    test_data$X, test_data$Y, model,
    metrics = c("x2y.rmse", "y2x.rmse")
  )
}

cv_res <- cv_generic(
  data = list(X = X, Y = Y),
  folds = folds,
  .fit_fun = cv_fit,
  .measure_fun = cv_measure,
  fit_args = list(ncomp = 2)
)

The key is packaging X and Y together so fold indices apply to both blocks simultaneously.

1.6 Permutation Test

We can test the significance of the X-Y relationship using perm_test(). The default shuffles rows of Y relative to X and measures whether the observed transfer error is lower than expected by chance.

perm_res <- perm_test(
  cp_cca, X, Y = Y,
  nperm = 199,
  alternative = "less"
)
print(perm_res)

The default statistic is x2y.mse (mean squared error when predicting Y from X). A significant result (low p-value) indicates the CCA relationship is unlikely due to chance.

1.7 CCA Take-aways

cross_projector provides a convenient wrapper for two-block methods like CCA.
It enables using standard multiblock verbs (project, transfer, partial_project, cv, perm_test) with the wrapped model.
Preprocessing is handled internally, reducing risk of data leakage.
Other methods (PLS, O2PLS) can be integrated by providing a different fit_fun.

2. Example: Wrapping `glmnet` as a `projector`

The projector object maps data from its original space to a lower-dimensional space defined by its components (v). This is useful for dimensionality reduction (like PCA) but can also represent coefficients from supervised models like penalized regression.

Here, we fit LASSO using glmnet and wrap the resulting coefficients (excluding the intercept) into a projector.

2.1 Data and Model Fitting

# Generate sample data
set.seed(123)
n_obs <- 100
n_pred <- 50
X_glm <- matrix(rnorm(n_obs * n_pred), n_obs, n_pred)
# True coefficients (sparse)
true_beta <- matrix(0, n_pred, 1)
true_beta[1:10, 1] <- runif(10, -1, 1)
# Response variable with noise
y_glm <- X_glm %*% true_beta + rnorm(n_obs, sd = 0.5)

# Fit glmnet (LASSO, alpha=1)
# Typically use cv.glmnet to find lambda, but using a fixed one for simplicity
glm_fit <- glmnet::glmnet(X_glm, y_glm, alpha = 1) # alpha=1 is LASSO

# Choose a lambda (e.g., one near the end of the path)
chosen_lambda <- glm_fit$lambda[length(glm_fit$lambda) * 0.8]

# Get coefficients for this lambda
beta_hat <- coef(glm_fit, s = chosen_lambda)
print(paste("Number of non-zero coefficients:", sum(beta_hat != 0)))

# Extract coefficients, excluding the intercept
v_glm <- beta_hat[-1, 1, drop = FALSE] # Drop intercept, ensure it's a matrix
dim(v_glm) # Should be n_pred x 1

2.2 Create the `projector`

We define a projector using these coefficients. The projection X %*% v gives a single score per observation, representing the LASSO linear predictor. We also include centering and scaling as preprocessing, often recommended for glmnet.

# Define preprocessor
# Define and fit preprocessor with training data
preproc_glm <- fit(colscale(center(), type = "z"), X_glm)

# Create the projector
proj_glm <- projector(
  v = v_glm,
  preproc = preproc_glm,
  classes = "glmnet_projector"
)

print(proj_glm)

2.3 Project New Data

We can now use this projector to calculate the LASSO linear predictor score for new data.

# Generate some new test data
X_glm_test <- matrix(rnorm(20 * n_pred), 20, n_pred)

# Project the test data
# project() handles applying the centering/scaling from preproc_glm
lasso_scores <- project(proj_glm, X_glm_test)

head(round(lasso_scores, 3))
dim(lasso_scores) # Should be n_test x 1 (since v_glm has 1 column)

# Compare with direct calculation using transform
# Apply the same preprocessing used within project()
X_glm_test_processed <- transform(preproc_glm, X_glm_test)
# Calculate scores directly using the processed data and coefficients
direct_scores <- X_glm_test_processed %*% v_glm
head(round(direct_scores, 3))

# Check they are close
all.equal(c(lasso_scores), c(direct_scores))

Any scripts or data that you put into this service are public.

multivarious documentation built on Jan. 22, 2026, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

multivarious
Extensible Data Structures for Multivariate Analysis

Extending multiblock: CCA and glmnet Examples
In multivarious: Extensible Data Structures for Multivariate Analysis

1. Canonical Correlation Analysis with `cross_projector`

1.1 Data Setup

1.2 Wrap `stats::cancor()` into a `cross_projector`

1.3 Bidirectional Prediction (Transfer)

1.4 Using Partial Features

1.5 Cross-validated Component Selection

1.6 Permutation Test

1.7 CCA Take-aways

2. Example: Wrapping `glmnet` as a `projector`

2.1 Data and Model Fitting

2.2 Create the `projector`

2.3 Project New Data

Try the multivarious package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multivarious Extensible Data Structures for Multivariate Analysis

Extending multiblock: CCA and glmnet Examples In multivarious: Extensible Data Structures for Multivariate Analysis

1. Canonical Correlation Analysis with cross_projector

1.1 Data Setup

1.2 Wrap stats::cancor() into a cross_projector

1.3 Bidirectional Prediction (Transfer)

1.4 Using Partial Features

1.5 Cross-validated Component Selection

1.6 Permutation Test

1.7 CCA Take-aways

2. Example: Wrapping glmnet as a projector

2.1 Data and Model Fitting

2.2 Create the projector

2.3 Project New Data

Try the multivarious package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multivarious
Extensible Data Structures for Multivariate Analysis

Extending multiblock: CCA and glmnet Examples
In multivarious: Extensible Data Structures for Multivariate Analysis

1. Canonical Correlation Analysis with `cross_projector`

1.2 Wrap `stats::cancor()` into a `cross_projector`

2. Example: Wrapping `glmnet` as a `projector`

2.2 Create the `projector`