estimate_lucid: Fit LUCID models with one or multiple omics layers

View source: R/EM_all.R

estimate_lucidR Documentation

Fit LUCID models with one or multiple omics layers

Description

EM algorithm to estimate LUCID with one or multiple omics layers

Usage

estimate_lucid(
  lucid_model = c("early", "parallel", "serial"),
  G,
  Z,
  Y,
  CoG = NULL,
  CoY = NULL,
  K,
  init_omic.data.model = "EEV",
  useY = TRUE,
  tol = 0.001,
  max_itr = 1000,
  max_tot.itr = 10000,
  Rho_G = 0,
  Rho_Z_Mu = 0,
  Rho_Z_Cov = 0,
  family = c("normal", "binary"),
  seed = 123,
  init_impute = c("mix", "lod"),
  init_par = c("mclust", "random"),
  verbose = FALSE
)

Arguments

lucid_model

Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial

G

an N by P matrix representing exposures

Z

Omics data, if "early", an N by M matrix; If "parallel", a list, each element i is a matrix with N rows and P_i features; If "serial", a list, each element i is a matrix with N rows and p_i features or a list with two or more matrices with N rows and a certain number of features

Y

a length N vector

CoG

an N by V matrix representing covariates to be adjusted for G -> X

CoY

an N by K matrix representing covariates to be adjusted for X -> Y

K

Number of latent clusters. If "early", an integer greater or equal to 2; If "parallel",an integer vector, same length as Z, with each element being an interger greater or equal to 2; If "serial", a list, each element is either an integer like that for "early" or an list of integers like that for "parallel", same length as Z

init_omic.data.model

a vector of strings specifies the geometric model of omics data. If NULL, See more in ?mclust::mclustModelNames

useY

logical, if TRUE, EM algorithm fits a supervised LUCID; otherwise unsupervised LUCID.

tol

stopping criterion for the EM algorithm

max_itr

Maximum iterations of the EM algorithm. If the EM algorithm iterates more than max_itr without converging, the EM algorithm is forced to stop.

max_tot.itr

Max number of total iterations for estimate_lucid function. estimate_lucid may conduct EM algorithm for multiple times if the algorithm fails to converge.

Rho_G

A scalar. This parameter is the LASSO penalty to regularize exposures. If user wants to tune the penalty, use the wrapper function lucid. Now only achieved for LUCID early integration.

Rho_Z_Mu

A scalar. This parameter is the LASSO penalty to regularize cluster-specific means for omics data (Z). If user wants to tune the penalty, use the wrapper function lucid.Now only achieved for LUCID early integration.

Rho_Z_Cov

A scalar. This parameter is the graphical LASSO penalty to estimate sparse cluster-specific variance-covariance matrices for omics data (Z). If user wants to tune the penalty, use the wrapper function lucid. Now only achieved for LUCID early integration.

family

The distribution of the outcome

seed

Random seed to initialize the EM algorithm

init_impute

Method to initialize the imputation of missing values in LUCID. mix will use mclust:imputeData to implement EM Algorithm for Unrestricted General Location Model by the mix package to impute the missing values in omics data; lod will initialize the imputation via replacing missing values by LOD / sqrt(2). LOD is determined by the minimum of each variable in omics data.

init_par

For "early", an interface to initialize EM algorithm, if mclust, initiate the parameters using the mclust package, if random, initiate the parameters by drawing from a uniform distribution; For "parallel", mclust is the default for quick convergence; For "serial", each sub-model follows the above depending on it is a "early" or "parallel"

verbose

A flag indicates whether detailed information for each iteration of EM algorithm is printed in console. Default is FALSE.

Value

A list contains the object below:

  1. res_Beta: estimation for G->X associations

  2. res_Mu: estimation for the mu of the X->Z associations

  3. res_Sigma: estimation for the sigma of the X->Z associations

  4. res_Gamma: estimation for X->Y associations

  5. inclusion.p: inclusion probability of cluster assignment for each observation

  6. K: umber of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"

  7. var.names: names for the G, Z, Y variables

  8. init_omic.data.model: pre-specified geometric model of multi-omics data

  9. likelihood: converged LUCID model log likelihood

  10. family: the distribution of the outcome

  11. select: for LUCID early integration only, indicators of whether each exposure and omics feature is selected

  12. useY: whether this LUCID model is supervised

  13. Z: multi-omics data

  14. init_impute: pre-specified imputation method

  15. init_par: pre-specified parameter initialization method

  16. Rho: for LUCID early integration only, pre-specified regularity tuning parameter

  17. N: number of observations

  18. submodel: for LUCID in serial only, storing all the submodels

Examples

i <- 1008
set.seed(i)
G <- matrix(rnorm(500), nrow = 100)
Z1 <- matrix(rnorm(1000),nrow = 100)
Z2 <- matrix(rnorm(1000), nrow = 100)
Z3 <- matrix(rnorm(1000), nrow = 100)
Z4 <- matrix(rnorm(1000), nrow = 100)
Z5 <- matrix(rnorm(1000), nrow = 100)
Z <- list(Z1 = Z1, Z2 = Z2, Z3 = Z3, Z4 = Z4, Z5 = Z5)
Y <- rnorm(100)
CoY <- matrix(rnorm(200), nrow = 100)
CoG <- matrix(rnorm(200), nrow = 100)
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y, K = list(2,2,2,2,2),
lucid_model = "serial",
family = "normal",
seed = i,
CoG = CoG, CoY = CoY,
useY = TRUE)

LUCIDus documentation built on Nov. 2, 2023, 5:21 p.m.