Type: Package
Title: Detecting sparse microbial association signals from longitudinal microbiome data based on generalized estimating equations
Version: 1.0
Author: Han Sun
Maintainer: Han Sun sunh529@mails.ccnu.edu.cn; Xingpeng Jiang xpjiang@mail.ccnu.edu.cn
Imports: phyloseq, cluster, compositions, permute, PGEE, vegan, ape, dirmult, aSPU, MiSPU, devtools
Description: GEEMiHC is used for detecting sparse microbial association signals between microbiome and a host phenotype from longitudinal microbiome data.
License: GPL-2
Encoding: UTF-8
LazyData: true
URL: https://github.com/xpjiang-ccnu/GEEMiHC
This R package, GEEMiHC, can be used for detecting sparse microbial association signals adaptively from longitudinal microbiome data. It can be applied to datasets with diverse types of outcomes to study the association between diverse types of host phenotype and microbiome, such BMI (Gaussian distribution), disease status (Binomial distribution) or number of tumors (Poisson distribution). Considering cross-sectional data as a special case of longitudinal data, it can be also applied to cross-sectional data, in which case the results will be consistent with MiHC.
You may install GEEMiHC
from GitHub using the following code:
devtools::install_github("xpjiang-ccnu/GEEMiHC", force=T)
GEEMiHC(y, id, covs, otu.tab, tree, model, Gamma=c(1,3,5,7,9), Lamda=matrix(c(1, rep(0, 8), rep(1/3, 3), rep(0, 6), rep(1/5, 5), rep(0, 4), rep(1/7, 7), rep(0, 2), rep(1/9, 9)), 5, 9, byrow = T), comp=FALSE, CLR=FALSE, opt.ncl=30, n.perm=5000)
$rank.order - rank order for significant factors.
$simes.pv.GEE.AR - The p-value for the Simes test of GEEMiHC with autoregressive structure.
$simes.pv.GEE.EX - The p-value for the Simes test of GEEMiHC with exchange structure.
$simes.pv.GEE.IN - The p-value for the Simes test of GEEMiHC with independence structure.
$ind.pvs.GEEMiHC.AR - The p-values for the item-by-item unweighted and weighted higher criticism tests of GEEMiHC with autoregressive structure.
$ind.pvs.GEEMiHC.EX - The p-values for the item-by-item unweighted and weighted higher criticism tests of GEEMiHC with exchange structure.
$ind.pvs.GEEMiHC.IN - The p-values for the item-by-item unweighted and weighted higher criticism tests of GEEMiHC with independence structure.
$ada.pvs - The p-values for global omnibus higher criticism tests of three GEEMiHC with different structure and aGEEMiHC.
Import requisite R packages:
Import example microbiome data:
otu.tab <- CD_longitudinal@otu_table
tree <- CD_longitudinal@phy_tree
y <- sample_data(CD_longitudinal)$label
covs <- data.frame(matrix(NA, length(y), 2))
covs[,1] <- as.numeric(sample_data(CD_longitudinal)$age)
covs[,2] <- as.factor(sample_data(CD_longitudinal)$smoker)
id <- sample_data(CD_longitudinal)$id
out <- GEEMiHC(y, id, covs=covs, otu.tab=otu.tab, tree=tree, model="binomial", n.perm=5000)
Sun H, et al. Detecting sparse microbial association signals adaptively from longitudinal microbiome data based on generalized estimating equations. Briefings in Bioinformatics, Volume 23, Issue 5, September 2022, bbac149, https://doi.org/10.1093/bib/bbac149
Koh H and Zhao N. A powerful microbial group association test based on the higher criticism analysis for sparse microbial association signals. Microbiome 2020;8(1):63.
McMurdie PJ and Holmes S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4):e61217
Paradis E, et al. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 2004;20(2):289-290.
Reynolds A, et al. Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. J Math Model Algor 2006;5:475–504.
Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika 1986;73(3):751-754.
Vázquez-Baeza Y., et al. Guiding longitudinal sampling in IBD cohorts. Gut 2018;67:1743-1745.
Wang L. GEE analysis of clustered binary data with diverging number of covariates. Ann. Statist. 2011;39:389–417.
Wang L, et al. Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 2012;68(2):353-360.
Wu C, et al. An adaptive association test for microbiome data. Genome Med 2016;8(1):56.
We generate the OTUs count data simulated based on the Dirichlet-multinomial model according to real data.
SimulateOTU(data, nSam, parameters, mu, size)
data - real data.
nSam - Sample size.
parameters - The estimated parameter based on a real microbiome data, including OTU proportions and overdispersion parameter.
mu - The mean of the negative binomial distribution.
size - The size of the negative binomial distribution.
$OTU - OTU counts table simulated based on real data.
data("throat.otu.tab", package = "MiSPU")
nOTU = 100
otu_sum <- apply(throat.otu.tab, 2, sum)
throat.otu.tab.100 <- throat.otu.tab[, order(otu_sum, decreasing = T)[1:nOTU]]
parameters <- dirmult(throat.otu.tab.100)
otu.tab <- SimulateOTU(throat.otu.tab.100, nSam = 50, parameters, mu = 1000, size = 25)
Chen J and Li H. Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis. Annals of Applied Statistics 2013;7(1).
Sun H, et al. A powerful adaptive microbiome-based association test for microbial association signals with diverse sparsity levels. Journal of Genetics and Genomics 2021;48(9):851-859.
Wu C, et al. An adaptive association test for microbiome data. Genome Med 2016;8(1):56.
Our code mainly refers to R packages, MiHC, MiSPU and MiATDS.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.