Title: Microbiome Virtual Twins
Version: 1.0
Date: 2022-12-15
Author: Hyunwook Koh
Maintainer: Hyunwook Koh hyunwook.koh@stonybrook.edu
Description: This R package provides facilities for MiVT that predicts treatment effects using dML, and then identifies subgroups by treatment effects based on microbiome composition using BoRT to evaluate the interplay between microbiome and treatment.
NeedsCompilation: no
Depends: R(>= 4.1.1), cluster, compositions, dirmult, glmnet, GUniFrac, ecodist, neuralnet, phangorn, phyloseq, proxy, randomForest, rpart, rpart.plot, splitTools, zCompositions
License: GPL-2
NeedsCompilation: no
URL: https://github.com/hk1785/MiVT
If you have any problems for using this R package, please report in Issues (https://github.com/hk1785/MiVT/issues) or email Hyunwook Koh (hyunwook.koh@stonybrook.edu).
Rtools
https://cran.r-project.org/bin/windows/Rtools/rtools40.html
phyloseq
https://joey711.github.io/phyloseq/
cluster, compositions, devtools, dirmult, glmnet, GUniFrac, ecodist, neuralnet, phangorn, proxy, randomForest, rpart, rpart.plot, splitTools, zCompositions
install.packages(c("cluster", "compositions", "devtools", "dirmult", "glmnet", "GUniFrac", "ecodist", "nuralnet", "phangorn", "proxy", "randomForest", "rpart", "rpart.plot", "splitTools", "zCompositions"))
library(devtools)
install_github("hk1785/MiVT", force=T)
This R package contains four core functions, gen.syn.dat, biom.qc, dML and BoRT. Details are below.
This function generates example microbiome data.
gen.syn.dat(tree, tax.tab, prop, disp, num.sams = 50, seq.depth = sample(10000:1e+05, 50), keep.cut.off = 200)
A synthetic microbiome data in the 'phyloseq' format.
Import requisite R packages
library('phyloseq')
library('cluster')
library('dirmult')
library('phangorn')
library('compositions')
library('zCompositions')
library('GUniFrac')
library('ecodist')
library('proxy')
library('glmnet')
library('randomForest')
library('neuralnet')
library('splitTools')
library('rpart')
library('rpart.plot')
library('MiVT')
Generate example microbiome data
data(fit)
data(tree)
data(tax.tab)
prop <- fit$pi
disp <- fit$theta
sim.biom <- gen.syn.dat(tree = tree, tax.tab = tax.tab, prop = prop, disp = disp)
sim.biom
This function performs quality controls and data transformations that are needed for MiVT.
biom.qc(biom = biom, kingdom = "Bacteria", lib.size.cut.off = 1000, mean.prop.cut.off = 0, rem.tax.com = c("", "gut metagenome", "mouse gut metagenome", "metagenome", "NANANA"), rem.tax.par = c("uncultured", "incertae", "Incertae", "unclassified", "unidentified", "unknown"))
$tax.prop: A list of tables for the proportions of microbial taxa on each taxonomic rank (Phylum, Class, Order, Family, Genus, Species).
$otu.tab: A feature (OTU or ASV) table where rows are features and columns are subjects.
$tax.tab: A taxonomic table where rows are features (OTUs or ASVs), and columns are seven taxonomic ranks (Kingdom, Phylum, Class, Order, Family, Genus, Species).
$sam.dat: A metadata/sample information where rows are subjects and columns are variables. It should contain two binary variables: y (response) and Tr (treatment).
$tree: A rooted phylogenetic tree.
Import requisite R packages
library('phyloseq')
library('cluster')
library('dirmult')
library('phangorn')
library('compositions')
library('zCompositions')
library('GUniFrac')
library('ecodist')
library('proxy')
library('glmnet')
library('randomForest')
library('neuralnet')
library('splitTools')
library('rpart')
library('rpart.plot')
library('MiVT')
Generate example microbiome data
data(fit)
data(tree)
data(tax.tab)
prop <- fit$pi
disp <- fit$theta
sim.biom <- gen.syn.dat(tree = tree, tax.tab = tax.tab, prop = prop, disp = disp)
sim.biom
Perform quality controls and data transformations
qc.out <- biom.qc(biom = sim.biom)
This function implements dML to predicts treatment effects.
dML(y, Tr, X, tree, n.folds = 10, n.rep = 2, alpha = seq(0.05, 0.95, 0.05), n.trees = 1000, n.neus = c(1/2, 1/3, 1/4))
$out.en$cv.cro: CV cross-entropy values for the elastic net and each distance measure. $out.en$Z: Predicted treatment effects using the elastic net.
$out.rf$cv.cro: CV cross-entropy values for the random forest and each distance measure. $out.rf$Z: Predicted treatment effects using the random forest.
$out.dfn$cv.cro: CV cross-entropy values for the deep feedforward network and each distance measure. $out.dfn$Z: Predicted treatment effects using the deep feedforward network.
$Z: Predicted treatment effects using dML.
Koh, H. (2023) Subgroup identification using virtual twins for human microbiome studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 1-10 (DOI: 10.1109/TCBB.2023.3324139).
Foster, J. C., Taylor, J. M. & Ruberg, S. J. Subgroup identification from randomized clinical trial data. Stat. Med. 30(24), 2867-2880 (2011).
Import requisite R packages
library('phyloseq')
library('cluster')
library('dirmult')
library('phangorn')
library('compositions')
library('zCompositions')
library('GUniFrac')
library('ecodist')
library('proxy')
library('glmnet')
library('randomForest')
library('neuralnet')
library('splitTools')
library('rpart')
library('rpart.plot')
library('MiVT')
Generate example microbiome data
data(fit)
data(tree)
data(tax.tab)
prop <- fit$pi
disp <- fit$theta
sim.biom <- gen.syn.dat(tree = tree, tax.tab = tax.tab, prop = prop, disp = disp)
sim.biom
Perform quality controls and data transformations
qc.out <- biom.qc(biom = sim.biom)
Perform dML
dml.out <- dML(y = qc.out$sam.dat$y, Tr = qc.out$sam.dat$Tr, X = qc.out$otu.tab, tree = qc.out$tree)
This function implements BoRT for subgroup identification and significance testing.
BoRT(Z, tax.prop, tax.rank = c("Phylum", "Class", "Order", "Family", "Genus", "Species"), minsplit = 10, minbucket = 5, cp = 0.01, n.boot = 20000)
$Sel.Taxa: Short taxonomic IDs and full taxonomic names.
$BoRT.out: The output table of BoRT. Columns are the identified subgroups that correspond with the terminal nodes from left to right. N is the sample size for each subgroup. Overall TE represents the overall treatment effect, and Subgroup TE represents the subgroup treatment effect.
Koh, H. (2023) Subgroup identification using virtual twins for human microbiome studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 1-10 (DOI: 10.1109/TCBB.2023.3324139).
Foster, J. C., Taylor, J. M. & Ruberg, S. J. Subgroup identification from randomized clinical trial data. Stat. Med. 30(24), 2867-2880 (2011).
Import requisite R packages
library('phyloseq')
library('cluster')
library('dirmult')
library('phangorn')
library('compositions')
library('zCompositions')
library('GUniFrac')
library('ecodist')
library('proxy')
library('glmnet')
library('randomForest')
library('neuralnet')
library('splitTools')
library('rpart')
library('rpart.plot')
library('MiVT')
Generate example microbiome data
data(fit)
data(tree)
data(tax.tab)
prop <- fit$pi
disp <- fit$theta
sim.biom <- gen.syn.dat(tree = tree, tax.tab = tax.tab, prop = prop, disp = disp)
sim.biom
Perform quality controls and data transformations
qc.out <- biom.qc(biom = sim.biom)
Perform dML
dml.out <- dML(y = qc.out$sam.dat$y, Tr = qc.out$sam.dat$Tr, X = qc.out$otu.tab, tree = qc.out$tree)
Perform BoRT
bort.out <- BoRT(Z = dml.out$Z, tax.prop = qc.out$tax.prop, tax.rank = "Genus")
bort.out
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.