cscca.CV: Compositional Sparse Canonical Correlation Analysis (Cross...

View source: R/CCA_algorithm.R

cscca.CVR Documentation

Compositional Sparse Canonical Correlation Analysis (Cross Valication Version)

Description

The cross validation version of a compositional sparse canonical correlation analysis (sCCA) framework for integrating microbiome data with other high-dimensional omics data.

Usage

cscca.CV(
  Y,
  View.ind,
  View.type = NULL,
  eps.stop = 1e-04,
  max.step = 30,
  eps = 1e-04,
  T.step = 10,
  n_fold = 5,
  seed.sam.ind = NULL,
  show.info = FALSE,
  hp.lower = NULL,
  hp.upper = NULL,
  hp.eta.lower = NULL,
  hp.eta.upper = NULL,
  eta.warm.stat.mat = NULL,
  opt_n_design = 30,
  opt_n_iter = 20,
  Criterion = "cov",
  des.init = NULL,
  is.refit = F,
  is.refix.eta = T,
  opt_n_design.eta_warm = 30,
  opt_n_iter.eta_warm = 20,
  is.opt.hyper = TRUE,
  hyper_n_grid = 20,
  ...
)

Arguments

Y

a n*(K*p) matrix representing the observations.

View.ind

a (K*p) integer vector indicating the classes of features. The features with the same View.ind is in the same class.

View.type

a K vector encoding the structure type of each feature class. There are two choices: "O" (Omics Data),"C" (Compositional Data).

eps.stop

a numerical value controlling the convergence.

max.step

an integer controlling the maximum step for interaction.

eps

a numerical value controlling the convergence.

T.step

an integer controlling the maximum step for interaction.

n_fold

an integer representing the number of folds for cross validation.

seed.sam.ind

a vector of the seeds for sampling.

show.info

a bool suggesting whether to show information through the hyperparameter optimization.

hp.lower

a numerical value or K vector specifying the lower bound of the hyper-parameter.

hp.upper

a numerical value or K vector specifying the upper bound of the hyper-parameter.

hp.eta.lower

a numerical value or K vector specifying the lower bound of the hyper-parameter for eta.

hp.eta.upper

a numerical value or K vector specifying the upper bound of the hyper-parameter for eta.

eta.warm.stat.mat

a matrix providing statistics for warm start of eta.

opt_n_design

an integer controlling the number of design points in the hyperparameter optimization.

opt_n_iter

an integer controlling the number of iterations in the hyperparameter optimization.

Criterion

a character indicating the criterion we choose for cross validation.

des.init

an initial design for hyperparameter optimization.

is.refit

a bool suggesting whether to refit the model using the optimal hyper-parameters.

is.refix.eta

a bool suggesting whether eta is fixed during refitting.

opt_n_design.eta_warm

an integer controlling the number of design points for eta warm-start optimization.

opt_n_iter.eta_warm

an integer controlling the number of iterations for eta warm-start optimization.

is.opt.hyper

a bool suggesting whether to optimize the hyper-parameters.

hyper_n_grid

an integer controlling the grid size for hyperparameter search.

...

additional arguments passed to the internal optimization procedures.

Value

A list containing the following elements: (1) a.hat.opt.trgt: The coefficient vector estimated with the optimal hyper-parameter vector; (2) lam.opt.trgt: The optimal hyper-parameter vector.

References

1. Deng, L., Tang, Y., Zhang, X., et al. (2024). Structure-adaptive canonical correlation analysis for microbiome multi-omics data. Frontiers in Genetics, 15, 1489694.

2. Chen, J., Bushman, F. D., Lewis, J. D., et al. (2013). Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2), 244–258.

Examples

## Not run: 
library(dplyr)

n <- 200
p <- q <- 100
sigma.nu <- 5
sigma.eps <- 1
omega_X <- 0.85*c(rep(1/10,9),-9/10,rep(0,p-10))
omega_Y <- 0.85*c(seq(0.08,0.12,length = 10),rep(0,q-10))
Data1 <- DGP_OC(seed=10,n,p,q,sigma.nu,sigma.eps,omega_X,omega_Y)

library(mlrMBO)
Res.sCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                          View.type=c("O","O"),
                          show.info = TRUE)


Res.CsCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                                   View.type=c("O","C"),
                                   show.info = TRUE)

Res.sCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.sCCA.CV$lam.opt.trgt,
                     View.type=c("O","O"))
Res.CsCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.CsCCA.CV$lam.opt.trgt,
                     View.type=c("O","C"))
print(Res.sCCA.CV$Cri.opt.trgt)
print(Res.CsCCA.CV$Cri.opt.trgt)

## End(Not run)

MicrobiomeStat documentation built on Jan. 9, 2026, 1:07 a.m.