README.md

LUCIDus: Latent Unknown Clustering with Integrated Data

Build
Status CRAN_Status_Badge

Introduction

The LUCIDus R package is an integrative tool to obtain a joint estimation of latent or unknown clusters/subgroups with multi-omics data and phenotypic traits. This package is an implementation for the novel statistical method proposed in the research paper “A Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) with Phenotypic Traits” published by the Bioinformatics.

Citation

Cheng Peng, Jun Wang, Isaac Asante, Stan Louie, Ran Jin, Lida Chatzi, Graham Casey, Duncan C Thomas, David V Conti, A Latent Unknown Clustering Integrating Multi-Omics Data (LUCID) with Phenotypic Traits, Bioinformatics, Volume 36, Issue 3, 1 February 2020, Pages 842–850, https://doi.org/10.1093/bioinformatics/btz667

Installation

You can install the released version of LUCIDus from CRAN directly with:

install.packages("LUCIDus")

Or, it can be installed from GitHub using the following codes:

install.packages("devtools")
devtools::install_github("USCbiostats/LUCIDus")

Fitting the latent cluster models

library(LUCIDus)

Three functions, including est_lucid(), boot_lucid(), and tune_lucid(), are currently available for model fitting and selection. The model outputs can be summarized and visualized using summary_lucid() and plot_lucid() respectively. Predictions could be made with pred_lucid().

est_lucid()

Estimating latent clusters with multi-omics data, missing values in biomarker data are allowed, and information in the outcome of interest can be integrated

Example

For a testing dataset with 10 genetic features (5 causal) and 4 biomarkers (2 causal)

Integrative clustering without feature selection
set.seed(10)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE)

Checking important model outputs with summary_lucid()

summary_lucid(IntClusFit)

Visualize the results with Sankey diagram using plot_lucid()

plot_lucid(IntClusFit)

Re-run the model with covariates in the G->X path

IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE)

Check important model outputs

summary_lucid(IntClusCoFit)

Visualize the results

plot_lucid(IntClusCoFit)

boot_lucid()

Bootstrap method to achieve SEs for LUCID parameter estimates

Example

set.seed(10)
boot_lucid(G = G1, CoG = CoG, Z = Z1, Y = Y1, CoY = CoY, useY = TRUE, family = "binary", K = 2, R=500)

tune_lucid()

Example

Grid search for tuning parameters using parallel computing

# Better be run on a server or HPC
set.seed(10)
GridSearch <- tune_lucid(G=G1, Z=Z1, Y=Y1, K=2, Family="binary", USEY = TRUE,
                           LRho_g = 0.008, URho_g = 0.012, NoRho_g = 3,
                           LRho_z_invcov = 0.04, URho_z_invcov = 0.06, NoRho_z_invcov = 3,
                           LRho_z_covmu = 90, URho_z_covmu = 110, NoRho_z_covmu = 2)
GridSearch$Results
GridSearch$Optimal

Run LUCID with best tuning parameters and select informative features

set.seed(10)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
                        tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,
                                           Rho_G=0.01,Rho_Z_InvCov=0.06,Rho_Z_CovMu=90))
# Identify selected features
summary_lucid(IntClusFit)$No0G; summary_lucid(IntClusFit)$No0Z
colnames(G1)[summary_lucid(IntClusFit)$select_G]; colnames(Z1)[summary_lucid(IntClusFit)$select_Z]
# Select the features
if(!all(summary_lucid(IntClusFit)$select_G==FALSE)){
    G_select <- G1[,summary_lucid(IntClusFit)$select_G]
}
if(!all(summary_lucid(IntClusFit)$select_Z==FALSE)){
    Z_select <- Z1[,summary_lucid(IntClusFit)$select_Z]
}

Re-fit with selected features

set.seed(10)
IntClusFitFinal <- est_lucid(G=G_select,Z=Z_select,Y=Y1,K=2,family="binary",Pred=TRUE)

Visualize the results with a Sankey diagram

plot_lucid(IntClusFitFinal)

Re-run feature selection with covariates in the G->X path

IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
                          initial=def_initial(), itr_tol=def_tol(),
                          tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,Rho_G=0.02,Rho_Z_InvCov=0.1,Rho_Z_CovMu=93))
summary_lucid(IntClusCoFit)

Re-fit with selected features with covariates

IntClusCoFitFinal <- est_lucid(G=G_select,CoG=CoG,Z=Z_select,Y=Y1,K=2,family="binary",Pred=TRUE)

Visualize the results

plot_lucid(IntClusCoFitFinal)

For more details, see documentations for each function in the R package.

Built With

Versioning

The current version is 1.0.0.

For the versions available, see the Release on this repository.

Authors

License

This project is licensed under the GPL-2 License.

Acknowledgments



USCbiostats/LUCid documentation built on Feb. 22, 2020, 8:57 p.m.