knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This article will show you how to plot annotated correlation and microbial composition heatmaps with microViz.
library(dplyr) library(phyloseq) library(microViz)
First we'll get some OTU abundance data from inflammatory bowel disease patients and controls from the corncob package.
data("ibd", package = "microViz") ibd
Remove the mostly unclassified species-level data, drop the rare taxa and fix the taxonomy of the rest. Also drop patients with unclassified IBD.
psq <- ibd %>% tax_mutate(Species = NULL) %>% tax_filter(min_prevalence = 5) %>% tax_fix() %>% ps_filter(DiseaseState != "IBDundef") psq
Visualise the microbial composition of your samples.
The samples and taxa are sorted by similarity. (By default this uses hierarchical clustering with optimal leaf ordering, using euclidean distances on the transformed data).
In this example we use a "compositional" transformation, so the Class abundances are shown as proportions of each sample.
psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap()
You can easily swap to a symmetrical colour palette for transformations like "clr" or "standardize". This is the default symmetrical palette but you can pick from many.
psq %>% tax_transform("clr", rank = "Family") %>% comp_heatmap(colors = heat_palette(sym = TRUE), name = "CLR")
psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap(tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm")) ))
Positioning the heatmap legend at the bottom is possible. You can assign
the heatmap to a name and then call ComplexHeatmap
's draw
function.
heat <- psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap( tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm")) ), heatmap_legend_param = list( at = 0:5 / 5, direction = "horizontal", title_position = "leftcenter", legend_width = grid::unit(4, "cm"), grid_height = grid::unit(5, "mm") ) ) ComplexHeatmap::draw( object = heat, heatmap_legend_side = "bottom", adjust_annotation_extension = FALSE )
2 different methods for annotating each sample's values of categorical metadata are possible.
anno_sample()
cannot have borders around each cell, but
automatically adds a legend.
anno_sample_cat()
can have cell borders, but requires an extra
step to draw a legend
cols <- distinct_palette(n = 3, add = NA) names(cols) <- unique(samdat_tbl(psq)$DiseaseState) psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap( tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm")) ), sample_anno = sampleAnnotation( State1 = anno_sample("DiseaseState"), col = list(State1 = cols), border = FALSE, State2 = anno_sample_cat("DiseaseState", col = cols) ) )
Let's try drawing equivalent categorical annotations by two methods.
Both methods can draw annotations with borders and no individual boxes.
This style suits heatmaps with no gridlines (i.e. grid_col = NA
).
In the example below we have suppressed row ordering with
cluster_rows = FALSE
, and added spaces between taxa by splitting at
every row with row_split = 1:11
, which are both
ComplexHeatmap::Heatmap()
arguments.
psqC <- psq %>% tax_transform("compositional", rank = "Class") htmp <- psqC %>% comp_heatmap( grid_col = NA, cluster_rows = FALSE, row_title = NULL, row_split = seq_len(ntaxa(ps_get(psqC))), tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.9, size = grid::unit(1, "cm"), border = F) ), sample_anno = sampleAnnotation( # method one State1 = anno_sample("DiseaseState"), col = list(State1 = cols), border = TRUE, # method two State2 = anno_sample_cat( var = "DiseaseState", col = cols, box_col = NA, border_col = "black", legend_title = "State2" ) ) ) htmp %>% ComplexHeatmap::draw( annotation_legend_list = attr(htmp, "AnnoLegends") )
You can also manually draw a legend with the convenience function
anno_cat_legend()
.
grid::grid.newpage() anno_cat_legend( col = c("a level" = "red", "another level" = "blue", c = "white"), border = "black", gap = grid::unit(2, "cm"), ncol = 3 ) %>% ComplexHeatmap::draw()
Instead of sorting samples by similarity, you can alternatively arrange the samples beforehand with ps_arrange or other methods, and then suppress reordering of the heatmap with sample_seriation = "Identity"
cols <- distinct_palette(n = 3, add = NA) names(cols) <- unique(samdat_tbl(psq)$DiseaseState) psq %>% # sort all samples by similarity ps_seriate(rank = "Class", tax_transform = "compositional", dist = "bray") %>% # arrange the samples into Disease State groups ps_arrange(DiseaseState) %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap( tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm")) ), sample_anno = sampleAnnotation( State1 = anno_sample("DiseaseState"), col = list(State1 = cols), border = FALSE, State2 = anno_sample_cat("DiseaseState", col = cols) ), sample_seriation = "Identity" # suppress sample reordering )
If you have fewer samples (and taxa) you might like to label the cells with their values. By default, the raw counts are shown.
psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap(samples = 1:15, numbers = heat_numbers())
You can easily change to showing the same values as the colours by
setting numbers_use_counts = FALSE
, and you can/should change the
number of decimals shown too.
psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap( samples = 1:15, numbers_use_counts = FALSE, numbers = heat_numbers(decimals = 2) )
The numbers can any transformation of counts, irrespective of what transformations were used for the colours, or seriation.
psq %>% tax_transform("binary", undetected = 0, rank = "Class") %>% comp_heatmap( samples = 1:15, numbers_use_counts = TRUE, numbers_trans = "compositional", numbers = heat_numbers(decimals = 2, col = "white"), show_heatmap_legend = FALSE )
To demonstrate that coloration, numbering and seriation can all use different transformations of the original count data, the example below we specifies seriating the taxa and samples using the same numerical values used for the numbers transformation, not the colours, which are just presence/absence!
psq %>% tax_transform("binary", undetected = 0, rank = "Class") %>% comp_heatmap( samples = 1:15, sample_ser_counts = TRUE, sample_ser_trans = "compositional", tax_ser_counts = TRUE, tax_ser_trans = "compositional", numbers_use_counts = TRUE, numbers_trans = "compositional", numbers = heat_numbers(decimals = 2, col = "white"), show_heatmap_legend = FALSE )
Correlation heatmaps can be a nice way to quickly assess patterns of associations between numerical variables in your dataset, such as microbial abundances and other metadata.
Let's make some fake numeric variables to exemplify this.
set.seed(111) # ensures making same random variables every time! psq <- psq %>% ps_arrange(ibd) %>% ps_mutate( var1 = rnorm(nsamples(psq), mean = 10, sd = 3), var2 = c( rnorm(nsamples(psq) * 0.75, mean = 4, sd = 2), rnorm(1 + nsamples(psq) / 4, mean = 9, sd = 3) ), var3 = runif(nsamples(psq), 2, 10), var4 = rnorm(nsamples(psq), mean = 100 + nsamples(psq):0, sd = 20) / 20, var5 = rnorm(nsamples(psq), mean = 5, sd = 2), var6 = rnbinom(nsamples(psq), size = 1:75 / 10, mu = 5) )
By default, the cor_heatmap
function will correlate all taxa to all
numerical sample data, using pearson correlation method.
psq %>% tax_agg("Family") %>% cor_heatmap(vars = c("var1", "var2", "var3", "var4", "var5", "var6"))
It's easy to change to a different method, i.e. spearman's rank correlation or kendall's tau, which will be reflected in the legend title. We will also specify to use only the 15 most abundant taxa, by maximum count, just to make these tutorial figures a little more compact!
psq %>% tax_agg("Family") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), cor = "spearman" )
Older versions of microViz cor_heatmap
had a tax_transform
argument. But for flexibility, you must now transform your taxa
before passing the psExtra object to cor_heatmap
.
Here we have transformed our taxa with the "clr" or centered-log-ratio transformation prior to correlating. Notice that the annotations stay on the same scale, as by default the annotation functions extract the stored counts data from the psExtra input, not the transformed data.
psq %>% tax_agg("Family") %>% tax_transform("clr", zero_replace = "halfmin") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6) )
Let's transform and scale the taxon abundances before correlating.
psq %>% tax_agg("Family") %>% tax_transform("clr", zero_replace = "halfmin") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6) )
As seen in the previous plots taxa are annotated by default with prevalence and relative abundance.
You can transform the taxa for the abundance annotation. The trans
and
zero_replace
arguments are sent to tax_transform()
.
psq %>% tax_agg("Family") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), tax_anno = taxAnnotation( Prev. = anno_tax_prev(ylim = 0:1), Log10. = anno_tax_box(trans = "log10", zero_replace = "halfmin") ) )
You can do multiple transformations and or scaling by supplying a function, that takes a psExtra or phyloseq object, transforms it, and returns it.
psq %>% tax_agg("Family") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), tax_anno = taxAnnotation( Log2 = anno_tax_density( joyplot_scale = 2, gp = grid::gpar(fill = "black", alpha = 0.2), trans = "log2", zero_replace = 1 ), `prop Log2` = anno_tax_density( joyplot_scale = 1.5, gp = grid::gpar(fill = "black", alpha = 0.2), trans = function(ps) { ps %>% tax_transform("compositional", zero_replace = 1) %>% tax_transform("log2", chain = TRUE) } ) ) )
Note that by default the relative abundance is shown only for samples
where the taxon is detected! You can include values for all samples for
all taxa by setting only_detected = FALSE
.
Let's try this with a heatmap-style density plot annotation. We'll replace zeroes with ones for an interpretable minimum value on the plot.
We'll compare it side-by-side with the default setting of showing only distribution of values above the detection threshold.
For zero-inflated microbiome data, showing prevalence and "abundance when detected" often seems like a more informative annotation.
psq %>% tax_agg("Family") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), tax_anno = taxAnnotation( Prev. = anno_tax_prev(size = grid::unit(10, "mm"), ylim = 0:1), All = anno_tax_density( size = grid::unit(20, "mm"), trans = "log10", zero_replace = 1, heatmap_colors = viridisLite::turbo(n = 15), type = "heatmap", only_detected = FALSE ), Default = anno_tax_density( size = grid::unit(20, "mm"), trans = "log10", zero_replace = 1, heatmap_colors = viridisLite::turbo(n = 15), type = "heatmap", only_detected = TRUE ) ) )
By default, rows and columns are sorted using hierarchical clustering
with optimal leaf ordering "OLO_ward"
. You can use any valid method
from the seriation
package. You can suppress ordering by using
seriation_method = "Identity"
. By default this also suppresses column
ordering, so you can set seriation_method_col = OLO_ward
to keep
ordering.
psq %>% tax_agg("Family") %>% tax_sort(by = prev, at = "Family") %>% cor_heatmap( seriation_method = "Identity", seriation_method_col = "OLO_ward", taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), tax_anno = taxAnnotation( Prev. = anno_tax_prev(ylim = 0:1), CLR = anno_tax_box(trans = "clr", zero_replace = "halfmin") ) )
You can easily put the taxa annotations on another of the heatmap with
e.g. taxa_side = "left"
psq %>% tax_agg("Family") %>% tax_sort(by = prev, at = "Family") %>% cor_heatmap( seriation_method = "Identity", seriation_method_col = "OLO_ward", taxa_side = "left", taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), tax_anno = taxAnnotation( Prev. = anno_tax_prev(ylim = 0:1), CLR = anno_tax_box(trans = "clr", zero_replace = "halfmin") ) )
Or on the top or bottom is also possible, this will rotate the heatmap. Remember to swap the seriation method arguments around!
psq %>% tax_agg("Family") %>% tax_sort(by = prev, at = "Family") %>% cor_heatmap( seriation_method_col = "Identity", # swapped! seriation_method = "OLO_ward", # swapped! taxa_side = "top", taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), tax_anno = taxAnnotation( Prev. = anno_tax_prev(ylim = 0:1), CLR = anno_tax_box(trans = "clr", zero_replace = "halfmin") ) )
As well as annotating the taxa, you can also annotate the variables.
psq %>% tax_agg("Family") %>% cor_heatmap( taxa = tax_top(psq, 15, by = max, rank = "Family"), vars = paste0("var", 1:6), var_anno = varAnnotation( Value = anno_var_box(), Zscore = anno_var_density(fun = scale, type = "violin") ) )
In response to a number of questions/requests about annotating correlation heatmaps with p-values: cor_heatmap
cannot do this, so here is an alternative way using tax_model
.
# compute correlations, with p values, and store in a dataframe correlations_df <- psq %>% tax_model( trans = "clr", rank = "Family", variables = list("var1", "var2", "var3", "var4", "var5"), type = microViz::cor_test, method = "spearman", return_psx = FALSE, verbose = FALSE ) %>% tax_models2stats(rank = "Family") # get nice looking ordering of correlation estimates using hclust taxa_hclust <- correlations_df %>% dplyr::select(term, taxon, estimate) %>% tidyr::pivot_wider( id_cols = taxon, names_from = term, values_from = estimate ) %>% tibble::column_to_rownames("taxon") %>% as.matrix() %>% stats::dist(method = "euclidean") %>% hclust(method = "ward.D2") taxa_order <- taxa_hclust$labels[taxa_hclust$order]
library(ggplot2) correlations_df %>% mutate(p.FDR = p.adjust(p.value, method = "fdr")) %>% ggplot(aes(x = term, y = taxon)) + geom_raster(aes(fill = estimate)) + geom_point( data = function(x) filter(x, p.value < 0.05), shape = "asterisk" ) + geom_point( data = function(x) filter(x, p.FDR < 0.05), shape = "circle", size = 3 ) + scale_y_discrete(limits = taxa_order) + scale_fill_distiller(palette = "BrBG", limits = c(-0.45, 0.45)) + theme_minimal() + labs( x = NULL, y = NULL, fill = "Spearman's\nRank\nCorrelation", caption = paste( "Asterisk indicates p < 0.05, not FDR adjusted", "Filled circle indicates FDR corrected p < 0.05", sep = "\n" ))
Additionally, with the scale_y_dendro
function from the legendry package you can add a visualisation of the hclust dendrogram to the y axis. See: https://teunbrand.github.io/legendry/reference/scale_y_dendro.html
Complicated stuff demonstrated down here, not necessarily useful.
Two approaches to custom colour scale breaks. The first way is better, because the colour scale is interpolated through the default 11 colours, instead of only 5.
Transform data and customise only labels.
psq %>% tax_transform("compositional", rank = "Class") %>% tax_transform("log10", zero_replace = "halfmin", chain = TRUE) %>% comp_heatmap( tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm")) ), heatmap_legend_param = list( labels = rev(c("100%", " 10%", " 1%", " 0.1%", "0.01%")) ) )
This alternative way might be helpful in some cases, maybe...
It demonstrates that custom breaks can be set in heat_palette()
.
# seriation transform serTrans <- function(x) { tax_transform(x, trans = "log10", zero_replace = "halfmin", chain = TRUE) } psq %>% tax_transform("compositional", rank = "Class") %>% comp_heatmap( sample_ser_trans = serTrans, tax_ser_trans = serTrans, colors = heat_palette(breaks = c(0.0001, 0.001, 0.01, 0.1, 1), rev = T), tax_anno = taxAnnotation( Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm")) ), heatmap_legend_param = list(at = c(0.0001, 0.001, 0.01, 0.1, 1), break_dist = 1) )
devtools::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.