knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Bayesian models are essential for studying complex biological systems of interest (SOIs), such as irradiation effects on metabolite concentrations or vaccine impacts on immune gene repertoires. In these models layered parameters describe key biological features of the SOIs. After model fitting, each parameter is characterized by a posterior distribution: a probability distribution representing all plausible effect values given the observed data. With thousands of such posteriors in high-dimensional analyses, identifying large, reliable effects becomes challenging.
Traditional volcano plots address this for frequentist analyses by plotting fold-changes against –log(p-values). We introduce Bayesian volcano plots that instead visualize the posterior mean effect size of parameter $i$ ($\theta_i$) against the probability, $\pi_i$, where the $\pi_i$ quantifies the posterior probability that $\theta_i$ is not equal the null effect. This directly highlights both magnitude and biological relevance. While Sousa et al. (2020) conceptualized Bayesian volcano plots, our R package provides the first practical implementation for automated $\pi_i$ calculation and visualization of complex biological effects.
if (!require("BayesVolcano", quietly = TRUE)) { install.packages("BayesVolcano") }
To create a Bayesian volcano plot, BayesVolcano needs two main inputs:
If you used rstan or brms to fit your model, our extract_fit() function automatically converts your results into the right format. Just install the corresponding package (rstan or brms) first, then run extract_fit(your_model,parameter_name) to prepare your data.
data("posterior") head(posterior[, 1:4])
This data frame maps parameter names to biological entities with at least a column named "parameter" and a column named "label". Additional columns can be provided for later visualization (categorical like "group" or numerical like "value").
data("annotation_df") head(annotation_df)
The prepare_volcano_input() function takes as input the posterior and the annotation data frame. Run this now:
d <- prepare_volcano_input(posterior = posterior, annotation = annotation_df)
The output contains
str(d)
You can specify the null effect and credible interval level:
d <- prepare_volcano_input( posterior = posterior, annotation = annotation_df, null.effect = 0.5, CrI_level = 0.7 )
The function prepare_volcano_input() uses the posterior of each parameter from the annotations (e.g., effect size $\theta_i$ of parameter $i$) to calculate $\pi_i$:
$\pi_i = 2 \cdot \max\left(\int_{\theta_i = -\infty}^{\bar{\theta}} p(\theta_i)\mathrm{d}\theta_i, \int_{\theta_i = \bar{\theta}}^{\infty} p(\theta_i)\mathrm{d}\theta_i\right) - 1$
Where $\bar{\theta}$ is the null effect. This measures the probability that the effect is in the "direction" away from the null.
Important Note: A $\pi$-value near 0 doesn't prove the absence of an effect - it indicates the posterior is widely distributed around the null.
The plot_volcano() function creates a ggplot visual based on the formatted input:
plot_volcano(d)
This plot shows:
Where each point is a parameter with median effect size (x-axis) and $\pi$ (y-axis). The null effect is shown by the vertical line.
To visualize effect size uncertainty, add credible intervals as error bars:
plot_volcano(d, CrI = TRUE )
This adds:
You can also represent credible interval width through point size:
plot_volcano(d, CrI = TRUE, CrI_width = TRUE )
Use metadata columns for color-coding:
plot_volcano(d, color = "group" ) plot_volcano(d, color = "value" )
Combine with other visualizations:
plot_volcano(d, CrI = TRUE, CrI_width = TRUE, color = "group" )
# Customization requires the ggplot2 package if (!require("BayesVolcano", quietly = TRUE)) { install.packages("BayesVolcano") } library(ggplot2) p <- plot_volcano(d) p + xlab("my informative parameter") + ggtitle("My amazing plot")
Use geom_text() for basic labeling or ggrepel to avoid overplotting:
p <- plot_volcano(d) p + geom_text(aes(label = label)) # ggrepel version # library(ggrepel) # p + # geom_text_repel(aes(label=label))
Show labels only for reliable effects:
p <- plot_volcano(d) p + geom_text(aes(label = ifelse(abs(parameter.median) > 1.6 & # only show for parameter value > 0.5 pi > 0.95, # only show for pi > 0.95 label, # which variable contains the label "" ))) # do not display label if outside of set ranges
This highlights:
Julie de Sousa, Ondřej Vencálek, Karel Hron, Jan Václavík, David Friedecký, Tomáš Adam, Bayesian multiple hypotheses testing in compositional analysis of untargeted metabolomic data, Analytica Chimica Acta, Volume 1097, 2020, Pages 49-61, ISSN 0003-2670, https://doi.org/10.1016/j.aca.2019.11.006
Corresponding GitHub Repository: https://github.com/sousaju/BayesVolcano
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.