In david-barnett/microViz: Microbiome Data Analysis and Visualization

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This article will show you how to plot annotated correlation and microbial composition heatmaps with microViz.

Setup

library(dplyr)
library(phyloseq)
library(microViz)

First we'll get some OTU abundance data from inflammatory bowel disease patients and controls from the corncob package.

data("ibd", package = "microViz")
ibd

Remove the mostly unclassified species-level data, drop the rare taxa and fix the taxonomy of the rest. Also drop patients with unclassified IBD.

psq <- ibd %>%
  tax_mutate(Species = NULL) %>%
  tax_filter(min_prevalence = 5) %>%
  tax_fix() %>%
  ps_filter(DiseaseState != "IBDundef")
psq

Microbiome heatmaps

Visualise the microbial composition of your samples.

The samples and taxa are sorted by similarity. (By default this uses hierarchical clustering with optimal leaf ordering, using euclidean distances on the transformed data).

In this example we use a "compositional" transformation, so the Class abundances are shown as proportions of each sample.

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap()

You can easily swap to a symmetrical colour palette for transformations like "clr" or "standardize". This is the default symmetrical palette but you can pick from many.

psq %>%
  tax_transform("clr", rank = "Family") %>%
  comp_heatmap(colors = heat_palette(sym = TRUE), name = "CLR")

Annotating taxa

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(tax_anno = taxAnnotation(
    Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
  ))

Legend positioning

Positioning the heatmap legend at the bottom is possible. You can assign the heatmap to a name and then call ComplexHeatmap's draw function.

heat <- psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    heatmap_legend_param = list(
      at = 0:5 / 5,
      direction = "horizontal", title_position = "leftcenter",
      legend_width = grid::unit(4, "cm"), grid_height = grid::unit(5, "mm")
    )
  )

ComplexHeatmap::draw(
  object = heat, heatmap_legend_side = "bottom",
  adjust_annotation_extension = FALSE
)

Annotating samples

Group membership

2 different methods for annotating each sample's values of categorical metadata are possible.

anno_sample() cannot have borders around each cell, but automatically adds a legend.
anno_sample_cat() can have cell borders, but requires an extra step to draw a legend

cols <- distinct_palette(n = 3, add = NA)
names(cols) <- unique(samdat_tbl(psq)$DiseaseState)

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    sample_anno = sampleAnnotation(
      State1 = anno_sample("DiseaseState"),
      col = list(State1 = cols), border = FALSE,
      State2 = anno_sample_cat("DiseaseState", col = cols)
    )
  )

Let's try drawing equivalent categorical annotations by two methods. Both methods can draw annotations with borders and no individual boxes. This style suits heatmaps with no gridlines (i.e. grid_col = NA).

In the example below we have suppressed row ordering with cluster_rows = FALSE, and added spaces between taxa by splitting at every row with row_split = 1:11, which are both ComplexHeatmap::Heatmap() arguments.

psqC <- psq %>% tax_transform("compositional", rank = "Class")

htmp <- psqC %>%
  comp_heatmap(
    grid_col = NA,
    cluster_rows = FALSE, row_title = NULL,
    row_split = seq_len(ntaxa(ps_get(psqC))),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.9, size = grid::unit(1, "cm"), border = F)
    ),
    sample_anno = sampleAnnotation(
      # method one
      State1 = anno_sample("DiseaseState"),
      col = list(State1 = cols), border = TRUE,
      # method two
      State2 = anno_sample_cat(
        var = "DiseaseState", col = cols, box_col = NA, border_col = "black",
        legend_title = "State2"
      )
    )
  )
htmp %>% ComplexHeatmap::draw(
  annotation_legend_list = attr(htmp, "AnnoLegends")
)

You can also manually draw a legend with the convenience function anno_cat_legend().

grid::grid.newpage()
anno_cat_legend(
  col = c("a level" = "red", "another level" = "blue", c = "white"),
  border = "black", gap = grid::unit(2, "cm"), ncol = 3
) %>%
  ComplexHeatmap::draw()

Arranging samples

Instead of sorting samples by similarity, you can alternatively arrange the samples beforehand with ps_arrange or other methods, and then suppress reordering of the heatmap with sample_seriation = "Identity"

cols <- distinct_palette(n = 3, add = NA)
names(cols) <- unique(samdat_tbl(psq)$DiseaseState)

psq %>%
  # sort all samples by similarity
  ps_seriate(rank = "Class", tax_transform = "compositional", dist = "bray") %>% 
  # arrange the samples into Disease State groups
  ps_arrange(DiseaseState) %>% 
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    sample_anno = sampleAnnotation(
      State1 = anno_sample("DiseaseState"),
      col = list(State1 = cols), border = FALSE,
      State2 = anno_sample_cat("DiseaseState", col = cols)
    ),
    sample_seriation = "Identity" # suppress sample reordering
  )

Numbering cells

If you have fewer samples (and taxa) you might like to label the cells with their values. By default, the raw counts are shown.

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(samples = 1:15, numbers = heat_numbers())

You can easily change to showing the same values as the colours by setting numbers_use_counts = FALSE, and you can/should change the number of decimals shown too.

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(
    samples = 1:15, numbers_use_counts = FALSE,
    numbers = heat_numbers(decimals = 2)
  )

The numbers can any transformation of counts, irrespective of what transformations were used for the colours, or seriation.

psq %>%
  tax_transform("binary", undetected = 0, rank = "Class") %>%
  comp_heatmap(
    samples = 1:15, numbers_use_counts = TRUE, numbers_trans = "compositional",
    numbers = heat_numbers(decimals = 2, col = "white"),
    show_heatmap_legend = FALSE
  )

To demonstrate that coloration, numbering and seriation can all use different transformations of the original count data, the example below we specifies seriating the taxa and samples using the same numerical values used for the numbers transformation, not the colours, which are just presence/absence!

psq %>%
  tax_transform("binary", undetected = 0, rank = "Class") %>%
  comp_heatmap(
    samples = 1:15,
    sample_ser_counts = TRUE, sample_ser_trans = "compositional",
    tax_ser_counts = TRUE, tax_ser_trans = "compositional",
    numbers_use_counts = TRUE, numbers_trans = "compositional",
    numbers = heat_numbers(decimals = 2, col = "white"),
    show_heatmap_legend = FALSE
  )

Correlation heatmaps

Correlation heatmaps can be a nice way to quickly assess patterns of associations between numerical variables in your dataset, such as microbial abundances and other metadata.

Let's make some fake numeric variables to exemplify this.

set.seed(111) # ensures making same random variables every time!
psq <- psq %>%
  ps_arrange(ibd) %>%
  ps_mutate(
    var1 = rnorm(nsamples(psq), mean = 10, sd = 3),
    var2 = c(
      rnorm(nsamples(psq) * 0.75, mean = 4, sd = 2),
      rnorm(1 + nsamples(psq) / 4, mean = 9, sd = 3)
    ),
    var3 = runif(nsamples(psq), 2, 10),
    var4 = rnorm(nsamples(psq), mean = 100 + nsamples(psq):0, sd = 20) / 20,
    var5 = rnorm(nsamples(psq), mean = 5, sd = 2),
    var6 = rnbinom(nsamples(psq), size = 1:75 / 10, mu = 5)
  )

Calculating correlations

By default, the cor_heatmap function will correlate all taxa to all numerical sample data, using pearson correlation method.

psq %>%
  tax_agg("Family") %>%
  cor_heatmap(vars = c("var1", "var2", "var3", "var4", "var5", "var6"))

It's easy to change to a different method, i.e. spearman's rank correlation or kendall's tau, which will be reflected in the legend title. We will also specify to use only the 15 most abundant taxa, by maximum count, just to make these tutorial figures a little more compact!

psq %>%
  tax_agg("Family") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6), cor = "spearman"
  )

Older versions of microViz cor_heatmap had a tax_transform argument. But for flexibility, you must now transform your taxa before passing the psExtra object to cor_heatmap.

Here we have transformed our taxa with the "clr" or centered-log-ratio transformation prior to correlating. Notice that the annotations stay on the same scale, as by default the annotation functions extract the stored counts data from the psExtra input, not the transformed data.

psq %>%
  tax_agg("Family") %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6)
  )

Let's transform and scale the taxon abundances before correlating.

psq %>%
  tax_agg("Family") %>%
  tax_transform("clr", zero_replace = "halfmin") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6)
  )

Taxon annotations

As seen in the previous plots taxa are annotated by default with prevalence and relative abundance.

You can transform the taxa for the abundance annotation. The trans and zero_replace arguments are sent to tax_transform().

psq %>%
  tax_agg("Family") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(ylim = 0:1),
      Log10. = anno_tax_box(trans = "log10", zero_replace = "halfmin")
    )
  )

You can do multiple transformations and or scaling by supplying a function, that takes a psExtra or phyloseq object, transforms it, and returns it.

psq %>%
  tax_agg("Family") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    tax_anno = taxAnnotation(
      Log2 = anno_tax_density(
        joyplot_scale = 2, gp = grid::gpar(fill = "black", alpha = 0.2),
        trans = "log2", zero_replace = 1
      ),
      `prop Log2` = anno_tax_density(
        joyplot_scale = 1.5, gp = grid::gpar(fill = "black", alpha = 0.2),
        trans = function(ps) {
          ps %>%
            tax_transform("compositional", zero_replace = 1) %>%
            tax_transform("log2", chain = TRUE)
        }
      )
    )
  )

Note that by default the relative abundance is shown only for samples where the taxon is detected! You can include values for all samples for all taxa by setting only_detected = FALSE.

Let's try this with a heatmap-style density plot annotation. We'll replace zeroes with ones for an interpretable minimum value on the plot.

We'll compare it side-by-side with the default setting of showing only distribution of values above the detection threshold.

For zero-inflated microbiome data, showing prevalence and "abundance when detected" often seems like a more informative annotation.

psq %>%
  tax_agg("Family") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(size = grid::unit(10, "mm"), ylim = 0:1),
      All = anno_tax_density(
        size = grid::unit(20, "mm"),
        trans = "log10", zero_replace = 1,
        heatmap_colors = viridisLite::turbo(n = 15),
        type = "heatmap", only_detected = FALSE
      ),
      Default = anno_tax_density(
        size = grid::unit(20, "mm"),
        trans = "log10", zero_replace = 1,
        heatmap_colors = viridisLite::turbo(n = 15),
        type = "heatmap", only_detected = TRUE
      )
    )
  )

Sorting

By default, rows and columns are sorted using hierarchical clustering with optimal leaf ordering "OLO_ward". You can use any valid method from the seriation package. You can suppress ordering by using seriation_method = "Identity". By default this also suppresses column ordering, so you can set seriation_method_col = OLO_ward to keep ordering.

psq %>%
  tax_agg("Family") %>%
  tax_sort(by = prev, at = "Family") %>%
  cor_heatmap(
    seriation_method = "Identity",
    seriation_method_col = "OLO_ward",
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(ylim = 0:1),
      CLR = anno_tax_box(trans = "clr", zero_replace = "halfmin")
    )
  )

Taxa annotation side

You can easily put the taxa annotations on another of the heatmap with e.g. taxa_side = "left"

psq %>%
  tax_agg("Family") %>%
  tax_sort(by = prev, at = "Family") %>%
  cor_heatmap(
    seriation_method = "Identity",
    seriation_method_col = "OLO_ward",
    taxa_side = "left",
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(ylim = 0:1),
      CLR = anno_tax_box(trans = "clr", zero_replace = "halfmin")
    )
  )

Or on the top or bottom is also possible, this will rotate the heatmap. Remember to swap the seriation method arguments around!

psq %>%
  tax_agg("Family") %>%
  tax_sort(by = prev, at = "Family") %>%
  cor_heatmap(
    seriation_method_col = "Identity", # swapped!
    seriation_method = "OLO_ward", # swapped!
    taxa_side = "top",
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(ylim = 0:1),
      CLR = anno_tax_box(trans = "clr", zero_replace = "halfmin")
    )
  )

Variable annotation

As well as annotating the taxa, you can also annotate the variables.

psq %>%
  tax_agg("Family") %>%
  cor_heatmap(
    taxa = tax_top(psq, 15, by = max, rank = "Family"),
    vars = paste0("var", 1:6),
    var_anno = varAnnotation(
      Value = anno_var_box(),
      Zscore = anno_var_density(fun = scale, type = "violin")
    )
  )

Alternative: ggplot correlation heatmaps with p-values

In response to a number of questions/requests about annotating correlation heatmaps with p-values: cor_heatmap cannot do this, so here is an alternative way using tax_model.

# compute correlations, with p values, and store in a dataframe
correlations_df <- psq %>% 
  tax_model(
    trans = "clr",
    rank = "Family", variables = list("var1", "var2", "var3", "var4", "var5"), 
    type = microViz::cor_test, method = "spearman", 
    return_psx = FALSE, verbose = FALSE
  ) %>% 
  tax_models2stats(rank = "Family")

# get nice looking ordering of correlation estimates using hclust
taxa_hclust <- correlations_df %>% 
  dplyr::select(term, taxon, estimate) %>% 
  tidyr::pivot_wider(
    id_cols = taxon, names_from = term, values_from = estimate
  ) %>% 
  tibble::column_to_rownames("taxon") %>% 
  as.matrix() %>% 
  stats::dist(method = "euclidean") %>% 
  hclust(method = "ward.D2") 

taxa_order <- taxa_hclust$labels[taxa_hclust$order]

library(ggplot2)

correlations_df %>% 
  mutate(p.FDR = p.adjust(p.value, method = "fdr")) %>% 
  ggplot(aes(x = term, y = taxon)) +
  geom_raster(aes(fill = estimate)) +
  geom_point(
    data = function(x) filter(x, p.value < 0.05),
    shape = "asterisk"
  ) +
  geom_point(
    data = function(x) filter(x, p.FDR < 0.05),
    shape = "circle", size = 3
  ) +
  scale_y_discrete(limits = taxa_order) +
  scale_fill_distiller(palette = "BrBG", limits = c(-0.45, 0.45)) + 
  theme_minimal() +
  labs(
    x = NULL, y = NULL, fill = "Spearman's\nRank\nCorrelation",
    caption = paste(
      "Asterisk indicates p < 0.05, not FDR adjusted",
      "Filled circle indicates FDR corrected p < 0.05", sep = "\n"
  ))

Additionally, with the scale_y_dendro function from the legendry package you can add a visualisation of the hclust dendrogram to the y axis. See: https://teunbrand.github.io/legendry/reference/scale_y_dendro.html

Other stuff

Complicated stuff demonstrated down here, not necessarily useful.

Custom breaks and seriation

Two approaches to custom colour scale breaks. The first way is better, because the colour scale is interpolated through the default 11 colours, instead of only 5.

Transform data and customise only labels.

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  tax_transform("log10", zero_replace = "halfmin", chain = TRUE) %>%
  comp_heatmap(
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    heatmap_legend_param = list(
      labels = rev(c("100%", " 10%", "  1%", " 0.1%", "0.01%"))
    )
  )

This alternative way might be helpful in some cases, maybe... It demonstrates that custom breaks can be set in heat_palette().

# seriation transform
serTrans <- function(x) {
  tax_transform(x, trans = "log10", zero_replace = "halfmin", chain = TRUE)
}

psq %>%
  tax_transform("compositional", rank = "Class") %>%
  comp_heatmap(
    sample_ser_trans = serTrans, tax_ser_trans = serTrans,
    colors = heat_palette(breaks = c(0.0001, 0.001, 0.01, 0.1, 1), rev = T),
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    heatmap_legend_param = list(at = c(0.0001, 0.001, 0.01, 0.1, 1), break_dist = 1)
  )

Session info

devtools::session_info()

david-barnett/microViz documentation built on April 17, 2025, 4:25 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

david-barnett/microViz
Microbiome Data Analysis and Visualization

In david-barnett/microViz: Microbiome Data Analysis and Visualization

Setup

Microbiome heatmaps

Annotating taxa

Legend positioning

Annotating samples

Group membership

Arranging samples

Numbering cells

Correlation heatmaps

Calculating correlations

Taxon annotations

Sorting

Taxa annotation side

Variable annotation

Alternative: ggplot correlation heatmaps with p-values

Other stuff

Custom breaks and seriation

Session info

R Package Documentation

Browse R Packages

We want your feedback!

david-barnett/microViz Microbiome Data Analysis and Visualization

In david-barnett/microViz: Microbiome Data Analysis and Visualization

Setup

Microbiome heatmaps

Annotating taxa

Legend positioning

Annotating samples

Group membership

Arranging samples

Numbering cells

Correlation heatmaps

Calculating correlations

Taxon annotations

Sorting

Taxa annotation side

Variable annotation

Alternative: ggplot correlation heatmaps with p-values

Other stuff

Custom breaks and seriation

Session info

R Package Documentation

Browse R Packages

We want your feedback!

david-barnett/microViz
Microbiome Data Analysis and Visualization