knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(LUCIDus)

About LUCIDus

The LUCIDus package is aiming to provide researchers in the genetic epidemiology community with an integrative tool in R to obtain a joint estimation of latent or unknown clusters/subgroups with multi-omics data and phenotypic traits.

This package is an implementation for the novel statistical method proposed in the research paper "A Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) with Phenotypic Traits^[https://doi.org/10.1093/bioinformatics/btz667]" published by the Bioinformatics. LUCID improves the subtype classification which leads to better diagnostics as well as prognostics and could be the potential solution for efficient targeted treatments and successful personalized medicine.

LUCIDus Functions and Model Fitting

Four main functions, including est_lucid, boot_lucid, sem_lucid, and tune_lucid, are currently available for model fitting and feature selection. The model outputs can be summarized and visualized using summary_lucid and plot_lucid respectively. Predictions could be made with pred_lucid.

Here are the descriptions of LUCIDus functions:

| Function | Description | |:-----:|:---------------------| | est_lucid() | Estimates latent clusters using multi-omics data with/without the outcome of interest, and producing an IntClust object; missing values in biomarker data (Z) are allowed | | boot_lucid() | Latent cluster estimation through bootstrapping for statistical inference | | sem_lucid() | Latent cluster estimation with supplemented E-M algorithm for statistical inference | | tune_lucid() | Grid search for tuning parameters using parallel computing to determine an optimal choice of three tuning parameters with minimum model BIC | | summary_lucid() | Summarizes the results of integrative clustering based on an IntClust object | | plot_lucid() | Produces a Sankey diagram for the results of integrative clustering based on an IntClust object | | pred_lucid() | Predicts latent clusters and outcomes with an IntClust object and new data | | def_initial() | Defines initial values of model parameters in est_lucid(), sem_lucid() , and tune_lucid() fitting | | def_tune() | Defines selection options and tuning parameters in est_lucid(), sem_lucid() , and tune_lucid() fitting | | def_tol() | Defines tolerance settings in est_lucid(), sem_lucid() , and tune_lucid() fitting |

Examples

For a simulated data with 10 genetic features (5 causal) and 4 biomarkers (2 causal)

set.seed(10)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE)
summary_lucid(IntClusFit)
plot_lucid(IntClusFit)
GridSearch <- tune_lucid(G=G1, Z=Z1, Y=Y1, K=2, Family="binary", USEY = TRUE, NoCores = 2,
                         LRho_g = 0.008, URho_g = 0.012, NoRho_g = 3,
                         LRho_z_invcov = 0.04, URho_z_invcov = 0.06, NoRho_z_invcov = 3,
                         LRho_z_covmu = 90, URho_z_covmu = 100, NoRho_z_covmu = 3)
GridSearch$Results
GridSearch$Optimal
set.seed(0.01*0.05*95)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
                        tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,
                                           Rho_G=0.01,Rho_Z_InvCov=0.05,Rho_Z_CovMu=95))
summary_lucid(IntClusFit)$No0G; summary_lucid(IntClusFit)$No0Z
colnames(G1)[summary_lucid(IntClusFit)$select_G]; colnames(Z1)[summary_lucid(IntClusFit)$select_Z]
if(!all(summary_lucid(IntClusFit)$select_G==FALSE)){
    G_select <- G1[,summary_lucid(IntClusFit)$select_G]
}
if(!all(summary_lucid(IntClusFit)$select_Z==FALSE)){
    Z_select <- Z1[,summary_lucid(IntClusFit)$select_Z]
}
set.seed(10)
IntClusFitFinal <- est_lucid(G=G_select,Z=Z_select,Y=Y1,K=2,family="binary",Pred=TRUE)
plot_lucid(IntClusFitFinal)
# Re-run the model with covariates in the G->X path
set.seed(10)
IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE)
# Check important model outputs
summary_lucid(IntClusCoFit)

# Visualize the results
plot_lucid(IntClusCoFit)
# Re-run feature selection with covariates in both paths
set.seed(10)
IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,CoY=CoY,K=2,family="binary",Pred=TRUE,
                          initial=def_initial(), itr_tol=def_tol(),
                          tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,
                          Rho_G=0.02,Rho_Z_InvCov=0.1,Rho_Z_CovMu=93))

# Re-fit with selected features with covariates
set.seed(10)
IntClusCoFitFinal <- est_lucid(G=G_select,CoG=CoG,Z=Z_select,Y=Y1,CoY=CoY,K=2,
                               family="binary",Pred=TRUE)

# Visualize the results
plot_lucid(IntClusCoFitFinal)
set.seed(10)
BootCIs <- boot_lucid(G = G1, CoG = CoG, Z = Z1, Y = Y1, CoY = CoY, 
                      useY = TRUE, family = "binary", K = 2, R=50)
BootCIs$Results
set.seed(10)
sem_lucid(G=G2,Z=Z2,Y=Y2,useY=TRUE,K=2,Pred=TRUE,family="normal",Get_SE=TRUE,
            itr_tol = def_tol(tol_b = 1e-02, tol_m = 1e-02, tol_s = 1e-02, tol_g = 1e-02))

Final Remarks

Acknowledgments



USCbiostats/LUCid documentation built on Feb. 22, 2020, 8:57 p.m.