lucid: Fit a lucid model for integrated analysis on exposure,...

View source: R/lucid.R

lucidR Documentation

Fit a lucid model for integrated analysis on exposure, outcome and multi-omics data, allowing for tuning

Description

Fit a lucid model for integrated analysis on exposure, outcome and multi-omics data, allowing for tuning

Usage

lucid(
  G,
  Z,
  Y,
  CoG = NULL,
  CoY = NULL,
  family = c("normal", "binary"),
  K = 2,
  lucid_model = c("early", "parallel", "serial"),
  Rho_G = 0,
  Rho_Z_Mu = 0,
  Rho_Z_Cov = 0,
  verbose_tune = FALSE,
  ...
)

Arguments

G

Exposures, a numeric vector, matrix, or data frame. Categorical variable should be transformed into dummy variables. If a matrix or data frame, rows represent observations and columns correspond to variables.

Z

Omics data. If "early", an N by M matrix. If "parallel", a list, each element i is a matrix with N rows and P_i features. If "serial", a list, each element i is either a matrix with N rows and p_i features, or a list with two or more matrices with N rows.

Y

Outcome, a numeric vector. Categorical variable is not allowed. Binary outcome should be coded as 0 and 1.

CoG

Optional, covariates to be adjusted for estimating the latent cluster. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables.

CoY

Optional, covariates to be adjusted for estimating the association between latent cluster and the outcome. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables.

family

Distribution of outcome. For continuous outcome, use "normal"; for binary outcome, use "binary". Default is "normal".

K

Number of latent clusters to be tuned. For lucid_model = "early", number of latent clusters (should be greater or equal than 2). Either an integer or a vector of integer. If K is a vector, model selection on K is performed. For lucid_model = "parallel",a list with vectors of integers or just integers, same length as Z, if the element itself is a vector, model selection on K is performed; For lucid_model = "serial", a list, each element is either an integer or an list of integers, same length as Z, if the smallest element (integer) itself is a vector, model selection on K is performed

lucid_model

Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial

Rho_G

A scalar or a vector. This parameter is the LASSO penalty to regularize exposure coefficients in the G-to-X model; CoG covariates are not penalized. If it is a vector, lucid will call tune_lucid to conduct model selection and variable selection. User can try penalties from 0 to 1. Penalty tuning is supported for "early" and "parallel". For "serial", only scalar penalty inputs are supported.

Rho_Z_Mu

A scalar or a vector. This parameter is the LASSO penalty to regularize cluster-specific means for omics data (Z). If it is a vector, lucid will call tune_lucid to conduct model selection and variable selection. User can try penalties from 1 to 100. Penalty tuning is supported for "early" and "parallel". For "serial", only scalar penalty inputs are supported.

Rho_Z_Cov

A scalar or a vector. This parameter is the graphical LASSO penalty to estimate sparse cluster-specific variance-covariance matrices for omics data (Z). If it is a vector, lucid will call tune_lucid to conduct model selection and variable selection. User can try penalties from 0 to 1. Penalty tuning is supported for "early" and "parallel". For "serial", only scalar penalty inputs are supported.

verbose_tune

A flag to print details of tuning process.

...

Other parameters passed to estimate_lucid

Value

An optimal LUCID model

  1. res_Beta: estimation for G->X associations

  2. res_Mu: estimation for the mu of the X->Z associations

  3. res_Sigma: estimation for the sigma of the X->Z associations

  4. res_Gamma: estimation for X->Y associations

  5. inclusion.p: inclusion probability of cluster assignment for each observation

  6. K: number of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"

  7. var.names: names for the G, Z, Y variables

  8. init_omic.data.model: pre-specified geometric model of multi-omics data

  9. likelihood: converged LUCID model log likelihood

  10. family: the distribution of the outcome

  11. select: for "early" and "parallel", feature-selection indicators. For "parallel", select$selectG is the exposure-wise union across layers and select$selectG_layer stores per-layer exposure selection.

  12. useY: whether this LUCID model is supervised

  13. Z: multi-omics data

  14. init_impute: pre-specified imputation method

  15. init_par: pre-specified parameter initialization method

  16. Rho: for "early" and "parallel", pre-specified regularity tuning parameters

  17. N: number of observations

  18. submodel: for LUCID in serial only, storing all the submodels

Examples


# LUCID early integration (quick smoke example)
G <- sim_data$G[1:80, , drop = FALSE]
Z <- sim_data$Z[1:80, , drop = FALSE]
Y <- sim_data$Y_normal[1:80]
fit_early <- lucid(
  G = G, Z = Z, Y = Y,
  lucid_model = "early", family = "normal", K = 2,
  max_itr = 30, max_tot.itr = 60, seed = 1008
)

# LUCID in parallel (two layers)
i <- 1008
set.seed(i)
G <- matrix(rnorm(240), nrow = 80)
Z1 <- matrix(rnorm(320), nrow = 80)
Z2 <- matrix(rnorm(320), nrow = 80)
Z <- list(Z1 = Z1, Z2 = Z2)
CoY <- matrix(rnorm(160), nrow = 80)
CoG <- matrix(rnorm(160), nrow = 80)
Y <- rnorm(80)
fit_parallel <- lucid(
  G = G, Z = Z, Y = Y, K = list(2, 2),
  CoG = CoG, CoY = CoY, lucid_model = "parallel",
  family = "normal", seed = i,
  max_itr = 30, max_tot.itr = 60
)


LUCIDus documentation built on March 11, 2026, 9:06 a.m.

Related to lucid in LUCIDus...