# pdf_document # prettydoc::html_pretty: # theme: cayman # highlight: github knitr::opts_chunk$set( warning=FALSE, message=FALSE, collapse = TRUE, fig.align = "center", prompt = TRUE, comment = "" )
This is a practical tutorial on the use of miMediation
package, which introduces a phylogeny-based mediation test (PhyloMed) for high-dimensional microbial composition mediators. The methodology is described in detail in the @hong2023phylomed.
PhyloMed models microbiome mediation effect through a cascade of independent local mediation models of subcompositions on the internal nodes of the phylogenetic tree. Each local model captures the mediation effect of a subcomposition at a given taxonomic resolution. PhyloMed enables us to test the overall mediation effect of the entire microbial community and pinpoint internal nodes with significant subcomposition mediation effects.
knitr::include_graphics("exampleFig.png")
As depicted in the figure above, we propose to construct a local mediation model for the subcomposition at each internal node of the phylogenetic tree. The subcomposition on a given internal node consists of the relative abundance aggregated at its two child nodes. We apply the following robust linear regression model and generalized linear regression model to represent the causal path diagram of the local mediation model at the $j$th internal node $$E\left{ \log \left( \frac{M_{ij1}}{M_{ij2}}\right) \right} = \alpha^{\rm T}{jX} {\bf X}_i + \alpha_j T_i$$ $$g{E(Y_i)} = \beta^{\rm T}{jX} {\bf X}i + \beta{jT} T_i + \beta_{j} \log\left(\frac{M_{ij1}}{M_{ij2}}\right)$$ where $g(\cdot)$ is the link function depending on the type of the outcome and we omit the intercept term in both models as it can be absorbed into ${\bf X}_i$.
The local mediation null hypothesis is expressed as $$H_0^j:\alpha_j\beta_j=0$$, which is equivalent to the union of three disjoint component null hypotheses
\begin{align} H^{j}{00}&: \alpha_j = \beta{j} = 0,\ H^{j}{10}&: \alpha_j \neq 0, \beta{j} = 0,\ H^{j}{01}&: \alpha_j = 0, \beta{j} \neq 0. \end{align}
We define the mediation test statistic for $H_0^j$ as $$P_{\max_j}=\max(P_{\alpha_j},P_{\beta_j})$$ The $P_{\alpha_j},P_{\beta_j}$ represent the $p$-value for testing $\alpha_j=0$ and $\beta_j=0$, respectively. These two $p$-values could be obtained via asymptotic approach or permutation approach.
Thus, we obtain the $p$-value of mediation test in the $j$th local model using the following formula: $$Pr(P_{\max_j} \leq p_{\max_j})=\pi_{00} p^2_{\max_j} + \pi_{10} p_{\max_j} Pr(P_{\alpha_j} \leq p_{\max_j} \mid \alpha_j \neq 0) + \pi_{01} p_{\max_j} Pr(P_{\beta_j} \leq p_{\max_j} \mid \beta_j \neq 0)$$ In this formula, we need to estimate three component probabilities ($\pi_{00},\pi_{10},\pi_{01}$) representing the proportion of three null hypotheses ($H^{j}{00},H^{j}{10},H^{j}{01}$). and two power functions evaluated at $p{\max_j}$. We implement two methods (product, maxp) to estimate $\pi_{00},\pi_{10},\pi_{01}$.
"product" method: $\hat{\pi}{00}=\hat{\pi}{0\bullet}\hat{\pi}{\bullet0}/\hat{\pi}_0$, $\hat{\pi}{10}=(1-\hat{\pi}{0\bullet})\hat{\pi}{\bullet0}/\hat{\pi}0$, and $\hat{\pi}{01}=\hat{\pi}{0\bullet}(1-\hat{\pi}{\bullet0})/\hat{\pi}0$, where $\hat{\pi}_0=\hat{\pi}{0\bullet} + \hat{\pi}{\bullet0}-\hat{\pi}{0\bullet}\hat{\pi}_{\bullet0}$.
"maxp" method: $\hat{\pi}{00}=(\hat{\pi}{0\bullet}+\hat{\pi}{\bullet0}-\hat{\pi}_0)/\hat{\pi}_0$, $\hat{\pi}{10}=(\hat{\pi}0-\hat{\pi}{0\bullet})/\hat{\pi}0$, and $\hat{\pi}{01}=(\hat{\pi}0-\hat{\pi}{\bullet0})/\hat{\pi}_0$.
Note that $\hat{\pi}{0\bullet}, \hat{\pi}{\bullet0}, \hat{\pi}0$ are estimated by applying Jin and Cai's method (@jin2007estimating) to $P{\alpha_j},P_{\beta_j},P_{\max_j}$. After obtaining the $p$-values on all internal nodes, we apply Benjamini-Hochberg (BH) false discovery rate procedure (@benjamini1995controlling) to identify a collection of nodes on the phylogenetic tree with significant mediation effects. To test the global mediation null hypothesis $H_0: \cap_{j=1}^J H^j_0$, we apply the harmonic mean $p$-value (HMP) method (@wilson2019harmonic) to combine local mediation $p$-values.
It is well-known that low dose antibiotics have been used widely to stimulate weight gain in livestock. However, there is growing concern that antibiotic exposure may have long-term consequences. Several studies have shown that antibiotics can have great impact on the abundances of bacteria in the gut community. It is interesting to investigate whether the subtherapeutic antibiotic treatment effect on body weight is mediated through the perturbation of gut microbiome and study the underlying mechanisms.
The data here is from an experiment conducted by @cho2012antibiotics, in which young mice were treated by different low-dose antibiotic and evaluated changes in body fat and compositions of the microbiome in cecal and fecal samples. The mice in antibiotic group were heavier than those in the control group. We will show how to perform phyloMed
function by focusing on cecal samples.
library(miMediation) # Load data data(data.cecal) # Take a look at the data Trt <- data.cecal$treatment table(Trt) # 0: control 1: antibotics M <- data.cecal$mediators head(M[,1:6]) Y <- data.cecal$outcome summary(Y) tree <- data.cecal$tree
To run phyloMed
function, the minimum requirement is to provide treatment
, mediators
, outcome
, tree
information. In the chunk below, we set FDR = 0.1 (fdr.alpha=0.1
) in identifying mediating nodes and visualize the results in the tree plot (graph="rectangular"
). Note that if n.perm=1e4
, the function will output $p$-value calculated through permutation procedure as well and it will take $\sim 3$ minutes to output the result. In general, permutation procedure can provide more accurate result when sample size is small (e.g., sample size < 100). However, it is slower than asymptotic procedure. You can set verbose=TRUE
to keep track of the process.
# set random seed here so that you can get the same result every time you run the code set.seed(123) cecal.rsltlst <- phyloMed(Trt, M, Y, tree = tree, fdr.alpha = 0.1, n.perm = 1e4, graph = "rectangular") # take a look at phyloseq-class object cecal.physeq <- cecal.rsltlst$clean.data cecal.physeq cecal.rslt <- cecal.rsltlst$rslt # take a look at rslt (PhyloMed.P) cecal.rslt$PhyloMed.P
The output consists of four components:
node.pval
: mediation $p$-values on each internal node of the phylogenetic tree.sig.clade
: identified mediation node ids with their corresponding leaf-level descendant taxa name.null.prop
: estimated proportion of three disjoint component null hypotheses.global.pval
: global test $p$-value.Note that p-value is NA at internal node 144. If we set verbose=TRUE
, we could know the reason during the process. The underlying reason is that all values in the treatment variable equal to one after removing the subjects with subcomposition being zero. Thus, we skip this specific node.
In the figure above, the size of the circle on internal node is proportional to $-\log_{10}(\text{subcompostion mediation p-value }p_j)$, where $p_j$ lives in the node.pval
output. The identified mediation node is highlighted by a blue rectangle.
When there is no phylogenetic information available, the phyloMed
function could construct taxonomic tree based on the taxonomy table. The data.zeeviD
is a simulated dataset based on a real gut microbiome dataset from a healthy cohort (@zeevi2015personalized). The mediation signal was added at "Order.Clostridiales".
# Load data data(data.zeeviD) # Take a look at the data Trt <- data.zeeviD$treatment table(Trt) # 0: control 1: treatment M <- data.zeeviD$mediators dim(M) Y <- data.zeeviD$outcome summary(Y) tree <- data.zeeviD$tree head(tree)
# run asymptotic result by default demo.rsltlst <- phyloMed(Trt, M, Y, tree = tree, fdr.alpha = 0.1, graph = "circular") # take a look at phyloseq-class object demo.physeq <- demo.rsltlst$clean.data demo.physeq demo.rsltlst$rslt$PhyloMed.A
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.