cscca.CV: Compositional Sparse Canonical Correlation Analysis (Cross...
In MicrobiomeStat: Statistical Methods for Microbiome Compositional Data

View source: R/CCA_algorithm.R

cscca.CV

R Documentation

Compositional Sparse Canonical Correlation Analysis (Cross Valication Version)

Description

The cross validation version of a compositional sparse canonical correlation analysis (sCCA) framework for integrating microbiome data with other high-dimensional omics data.

Usage

cscca.CV(
  Y,
  View.ind,
  View.type = NULL,
  eps.stop = 1e-04,
  max.step = 30,
  eps = 1e-04,
  T.step = 10,
  n_fold = 5,
  seed.sam.ind = NULL,
  show.info = FALSE,
  hp.lower = NULL,
  hp.upper = NULL,
  hp.eta.lower = NULL,
  hp.eta.upper = NULL,
  eta.warm.stat.mat = NULL,
  opt_n_design = 30,
  opt_n_iter = 20,
  Criterion = "cov",
  des.init = NULL,
  is.refit = F,
  is.refix.eta = T,
  opt_n_design.eta_warm = 30,
  opt_n_iter.eta_warm = 20,
  is.opt.hyper = TRUE,
  hyper_n_grid = 20,
  ...
)

Arguments

`Y`	a n(Kp) matrix representing the observations.
`View.ind`	a (K*p) integer vector indicating the classes of features. The features with the same View.ind is in the same class.
`View.type`	a K vector encoding the structure type of each feature class. There are two choices: "O" (Omics Data),"C" (Compositional Data).
`eps.stop`	a numerical value controlling the convergence.
`max.step`	an integer controlling the maximum step for interaction.
`eps`	a numerical value controlling the convergence.
`T.step`	an integer controlling the maximum step for interaction.
`n_fold`	an integer representing the number of folds for cross validation.
`seed.sam.ind`	a vector of the seeds for sampling.
`show.info`	a bool suggesting whether to show information through the hyperparameter optimization.
`hp.lower`	a numerical value or K vector specifying the lower bound of the hyper-parameter.
`hp.upper`	a numerical value or K vector specifying the upper bound of the hyper-parameter.
`hp.eta.lower`	a numerical value or K vector specifying the lower bound of the hyper-parameter for eta.
`hp.eta.upper`	a numerical value or K vector specifying the upper bound of the hyper-parameter for eta.
`eta.warm.stat.mat`	a matrix providing statistics for warm start of eta.
`opt_n_design`	an integer controlling the number of design points in the hyperparameter optimization.
`opt_n_iter`	an integer controlling the number of iterations in the hyperparameter optimization.
`Criterion`	a character indicating the criterion we choose for cross validation.
`des.init`	an initial design for hyperparameter optimization.
`is.refit`	a bool suggesting whether to refit the model using the optimal hyper-parameters.
`is.refix.eta`	a bool suggesting whether eta is fixed during refitting.
`opt_n_design.eta_warm`	an integer controlling the number of design points for eta warm-start optimization.
`opt_n_iter.eta_warm`	an integer controlling the number of iterations for eta warm-start optimization.
`is.opt.hyper`	a bool suggesting whether to optimize the hyper-parameters.
`hyper_n_grid`	an integer controlling the grid size for hyperparameter search.
`...`	additional arguments passed to the internal optimization procedures.

Value

A list containing the following elements: (1) a.hat.opt.trgt: The coefficient vector estimated with the optimal hyper-parameter vector; (2) lam.opt.trgt: The optimal hyper-parameter vector.

References

1. Deng, L., Tang, Y., Zhang, X., et al. (2024). Structure-adaptive canonical correlation analysis for microbiome multi-omics data. Frontiers in Genetics, 15, 1489694.

2. Chen, J., Bushman, F. D., Lewis, J. D., et al. (2013). Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2), 244–258.

Examples

## Not run: 
library(dplyr)

n <- 200
p <- q <- 100
sigma.nu <- 5
sigma.eps <- 1
omega_X <- 0.85*c(rep(1/10,9),-9/10,rep(0,p-10))
omega_Y <- 0.85*c(seq(0.08,0.12,length = 10),rep(0,q-10))
Data1 <- DGP_OC(seed=10,n,p,q,sigma.nu,sigma.eps,omega_X,omega_Y)

library(mlrMBO)
Res.sCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                          View.type=c("O","O"),
                          show.info = TRUE)


Res.CsCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                                   View.type=c("O","C"),
                                   show.info = TRUE)

Res.sCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.sCCA.CV$lam.opt.trgt,
                     View.type=c("O","O"))
Res.CsCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.CsCCA.CV$lam.opt.trgt,
                     View.type=c("O","C"))
print(Res.sCCA.CV$Cri.opt.trgt)
print(Res.CsCCA.CV$Cri.opt.trgt)

## End(Not run)

MicrobiomeStat documentation built on Jan. 9, 2026, 1:07 a.m.