plot_heatmap: Create an ecologically-organized heatmap using ggplot2...
In mikemc/speedyseq: Faster implementations of phyloseq functions

plot_heatmap

R Documentation

Create an ecologically-organized heatmap using ggplot2 graphics

Description

There are many useful examples of phyloseq heatmap graphics in the phyloseq online tutorials. In a 2010 article in BMC Genomics, Rajaram and Oono show describe an approach to creating a heatmap using ordination methods to organize the rows and columns instead of (hierarchical) cluster analysis. In many cases the ordination-based ordering does a much better job than h-clustering. An immediately useful example of their approach is provided in the NeatMap package for R. The NeatMap package can be used directly on the abundance table (otu_table-class) of phylogenetic-sequencing data, but the NMDS or PCA ordination options that it supports are not based on ecological distances. To fill this void, phyloseq provides the plot_heatmap() function as an ecology-oriented variant of the NeatMap approach to organizing a heatmap and build it using ggplot2 graphics tools. The distance and method arguments are the same as for the plot_ordination function, and support large number of distances and ordination methods, respectively, with a strong leaning toward ecology. This function also provides the options to re-label the OTU and sample axis-ticks with a taxonomic name and/or sample variable, respectively, in the hope that this might hasten your interpretation of the patterns (See the sample.label and taxa.label documentation, below). Note that this function makes no attempt to overlay hierarchical clustering trees on the axes, as hierarchical clustering is not used to organize the plot. Also note that each re-ordered axis repeats at the edge, and so apparent clusters at the far right/left or top/bottom of the heat-map may actually be the same. For now, the placement of this edge can be considered arbitrary, so beware of this artifact of this graphical representation. If you benefit from this phyloseq-specific implementation of the NeatMap approach, please cite both our packages/articles.

Usage

plot_heatmap(
  physeq,
  method = "NMDS",
  distance = "bray",
  sample.label = NULL,
  taxa.label = NULL,
  low = "#000033",
  high = "#66CCFF",
  na.value = "black",
  trans = log_trans(4),
  max.label = 250,
  title = NULL,
  sample.order = NULL,
  taxa.order = NULL,
  first.sample = NULL,
  first.taxa = NULL,
  ...
)

Arguments

`physeq`	(Required). The data, in the form of an instance of the `phyloseq-class`. This should be what you get as a result from one of the `import` functions, or any of the processing downstream. No data components beyond the `otu_table` are strictly necessary, though they may be useful if you want to re-label the axis ticks according to some observable or taxonomic rank, for instance, or if you want to use a `UniFrac`-based distance (in which case your `physeq` data would need to have a tree included).
`method`	(Optional). The ordination method to use for organizing the heatmap. A great deal of the usefulness of a heatmap graphic depends upon the way in which the rows and columns are ordered.
`distance`	(Optional). A character string. The ecological distance method to use in the ordination. See `distance`.
`sample.label`	(Optional). A character string. The sample variable by which you want to re-label the sample (horizontal) axis.
`taxa.label`	(Optional). A character string. The name of the taxonomic rank by which you want to re-label the taxa/species/OTU (vertical) axis. You can see available options in your data using `rank_names(physeq)`.
`low`	(Optional). A character string. An R color. See `?colors` for options support in R (there are lots). The color that represents the lowest non-zero value in the heatmap. Default is a dark blue color, `"#000033"`.
`high`	(Optional). A character string. An R color. See `colors` for options support in R (there are lots). The color that will represent the highest value in the heatmap. The default is `"#66CCFF"`. Zero-values are treated as `NA`, and set to `"black"`, to represent a background color.
`na.value`	(Optional). A character string. An R color. See `colors` for options support in R (there are lots). The color to represent what is essentially the background of the plot, the non-observations that occur as `NA` or `0` values in the abundance table. The default is `"black"`, which works well on computer-screen graphics devices, but may be a poor choice for printers, in which case you might want this value to be `"white"`, and reverse the values of `high` and `low`, above.
`trans`	(Optional). `"trans"`-class transformer-definition object. A numerical transformer to use in the continuous color scale. See `trans_new` for details. The default is `log_trans(4)`.
`max.label`	(Optional). Integer. Default is 250. The maximum number of labeles to fit on a given axis (either x or y). If number of taxa or samples exceeds this value, the corresponding axis will be stripped of any labels. This supercedes any arguments provided to `sample.label` or `taxa.label`. Make sure to increase this value if, for example, you want a special label for an axis that has 300 indices.
`title`	(Optional). Default `NULL`. Character string. The main title for the graphic.
`sample.order`	(Optional). Default `NULL`. Either a single character string matching one of the `sample_variables` in your data, or a character vector of `sample_names` in the precise order that you want them displayed in the heatmap. This overrides any ordination ordering that might be done with the `method`/`distance` arguments.
`taxa.order`	(Optional). Default `NULL`. Either a single character string matching one of the `rank_names` in your data, or a character vector of `taxa_names` in the precise order that you want them displayed in the heatmap. This overrides any ordination ordering that might be done with the `method`/`distance` arguments.
`first.sample`	(Optional). Default `NULL`. A character string matching one of the `sample_names` of your input data (`physeq`). It will become the left-most sample in the plot. For the ordination-based ordering (recommended), the left and right edges of the axes are adjaacent in a continuous ordering. Therefore, the choice of starting sample is meaningless and arbitrary, but it is aesthetically poor to have the left and right edge split a natural cluster in the data. This argument allows you to specify the left edge and thereby avoid cluster-splitting, emphasize a gradient, etc.
`first.taxa`	(Optional). Default `NULL`. A character string matching one of the `taxa_names` of your input data (`physeq`). This is equivalent to `first.sample` (above), but for the taxa/OTU indices, usually the vertical axis.
`...`	(Optional). Additional parameters passed to `ordinate`.

Details

This approach borrows heavily from the heatmap1 function in the NeatMap package. Highly recommended, and we are grateful for their package and ideas, which we have adapted for our specific purposes here, but did not use an explicit dependency. At the time of the first version of this implementation, the NeatMap package depends on the rgl-package, which is not needed in phyloseq, at present. Although likely a transient issue, the rgl-package has some known installation issues that have further influenced to avoid making NeatMap a formal dependency (Although we love both NeatMap and rgl!).

Value

A heatmap plot, in the form of a ggplot2 plot object, which can be further saved and modified.

References

Because this function relies so heavily in principle, and in code, on some of the functionality in NeatMap, please site their article if you use this function in your work.

Rajaram, S., & Oono, Y. (2010). NeatMap–non-clustering heat map alternatives in R. BMC Bioinformatics, 11, 45.

Please see further examples in the phyloseq online tutorials.

Examples

data("GlobalPatterns")
gpac <- subset_taxa(GlobalPatterns, Phylum=="Crenarchaeota")
# FYI, the base-R function uses a non-ecological ordering scheme,
# but does add potentially useful hclust dendrogram to the sides...
gpac <- subset_taxa(GlobalPatterns, Phylum=="Crenarchaeota")
# Remove the nearly-empty samples (e.g. 10 reads or less)
gpac = prune_samples(sample_sums(gpac) > 50, gpac)
# Arbitrary order if method set to NULL
plot_heatmap(gpac, method=NULL, sample.label="SampleType",
  taxa.label="Family")
# Use ordination
plot_heatmap(gpac, sample.label="SampleType", taxa.label="Family")
# Use ordination for OTUs, but not sample-order
plot_heatmap(gpac, sample.label="SampleType", taxa.label="Family",
  sample.order="SampleType")
# Specifying both orders omits any attempt to use ordination. The following
# should be the same.
p0 = plot_heatmap(gpac, sample.label="SampleType", taxa.label="Family",
  taxa.order="Phylum", sample.order="SampleType")
p1 = plot_heatmap(gpac, method=NULL, sample.label="SampleType",
  taxa.label="Family", taxa.order="Phylum", sample.order="SampleType")
#expect_equivalent(p0, p1)
# Example: Order matters. Random ordering of OTU indices is difficult to
# interpret, even with structured sample order
rando = sample(taxa_names(gpac), size=ntaxa(gpac), replace=FALSE)
plot_heatmap(gpac, method=NULL, sample.label="SampleType",
  taxa.label="Family", taxa.order=rando, sample.order="SampleType")
# # Select the edges of each axis. 
# First, arbitrary edge, ordering
plot_heatmap(gpac, method=NULL)
# Second, biological-ordering (instead of default ordination-ordering), but
# arbitrary edge
plot_heatmap(gpac, taxa.order="Family", sample.order="SampleType")
# Third, biological ordering, selected edges
plot_heatmap(gpac, taxa.order="Family", sample.order="SampleType",
  first.taxa="546313", first.sample="NP2")
# Fourth, add meaningful labels
plot_heatmap(gpac, sample.label="SampleType", taxa.label="Family",
  taxa.order="Family", sample.order="SampleType", first.taxa="546313",
  first.sample="NP2")

mikemc/speedyseq documentation built on April 22, 2024, 6:40 p.m.