In jbisanz/MicrobeR: Handy functions for microbiome analysis in R

knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE, tidy=FALSE, cache=FALSE)

Philosophy

MicrobeR is intended to supplement other packages such as phyloseq and vegan by providing wrapped functions for common analysis. As such, it calls upon these packages frequently behind the scenes. MicrobeR is primarily intended for data visualization and exploration of count-based microbiome data such as 16S rRNA gene sequencing. These functions are intended for exploration and as such can be wrapped using plotly's ggplotly to create interactive figures. Additionally, all plotting is carried out by ggplot2 so they can be manipulated directly through the addition of standard ggplot2 functions.

Expected Input Data Format

MicrobeR relies on 4 main types of data:

Feature Table: This could be OTUs/SVs/ISUs/KOcounts or any other type of compositional count. MicrobeR ALWAYS expects that sample names are colnames, and feature names are rownames. This is expected via the FEATURES argument for most functions.
Metadata Table: This is a table of sample metadata. MicrobeR ALWAYS expects that sample names are rownames, and categories names are columns. All functions refer to this as METADATA. *Note: All samples present in the Feature table MUST have metadata and an error will be returned if this is not the case.
Taxonomy Table: This is a table that contains the feature identifier (ex. OTU# or SV sequence) as rownames with the assigned Kingdom, Phylum, Class, Order, Family, Genus, and Species as columns. Additional columns are tolerated in this table as long as the previously mentioned columns are present.
Phylogenetic Tree: This is a tree of the phylo-class used primarily for calculated UniFrac distances. This could for example be the gg_13_8_otus/97_otus.tree if QIIME closed reference picking was applied. Or could be the result of MicrobeR's Make.Tree.R/Make.Tree.QIIME. function. For reading in the GG tree, phyloseqs import function is reccomended due to parsing issues.

Changes

In version 0.3, the OTUTABLE argument has been changed to FEATURES. Other changes have been made under the hood and at some point in the future the dependency of sample/feature IDs in row names will be removed.

MicrobeR Functions

Visualization

Microbiome.Barplot: Creates a barplot of %-normalized abundances.
Microbiome.Heatmap: Creates a heatmap of feature abundances.
PCoA: Calculate a distance/dissimilarity metric, carryout PCoA and plot a 2D plot with desired metadata included.
PCoA3D: Similar to above but creating a 3D interactive version. This may not work on some windows computers.

Normalization

Make.CLR.R: Carry out a centered-log2-ratio transformation using either a prior or the count zero multiplicative (CZM) method.
Make.Percent: Convert table of counts, to a percentage for plotting purposes.
Read.Filter: Removes samples that fell below a certain threshold of read depth. This should be used to identify poorly sequenced samples and remove controls from datasets.
Subsample.Table: Subsample feature table for metrics such as unweighted UniFrac using a defined randomization seed for reproducibility.
Summarize.Taxa: Analogous to QIIME's summarize_taxa.py. Creates a list of taxonomically summarized versions of the feature table. Useful for plotting and some statistical treatments.
Confidence.Filter: Removes features which are present in less than X samples with a total of less than Y reads. Useful for removing noisy sparse features from datasets before visualization or statistical analysis.
Filter.Fraction: Removes features which make up less than X% of dataset for diversity metrics as recommended by Bokulich et al. doi:10.1038/nmeth.2276.

Other

Make.Tree.QIIME: Make a phylogenetic tree. This function relies on an install of QIIME with muscle. See documentation for this function for more information. Make.Tree.R: Make a phylogenetic tree via an R based approach.
Merge.Replicates: Merges replicate samples (for example sequencing replicates) by summing reads together.
Nice.Table: A wrapper to create an interactive table for data exploration or embedding into markdown document.

Embedded Data

data("MicrobeRTestData"): Creates the following objects based on Bisanz et al. doi: 10.1128/AEM.00780-15. The data was created during a tutorial on microbiome analysis here.
- MicrobeR.Demo.Metadata: Example list of metadata for 20 individuals sampled 2x.
- MicrobeR.Demo.SVtable: Example SV table of count data.
- MicrobeR.Demo.Taxonomy: Table of taxonomic assignments.
- MicrobeR.Demo.Tree: Phylogenetic tree of SVs.

Example Usage

In these examples, we will be renderering interactive versions of the figures by wrapping all functions in ggplotly(). This is optional but is helpful for data exploration.
We can start by installing the package if you have not already. If installation fails, check which dependency was missing and install manually using install.packages() or bioconductor's biocLite().

library(devtools)
install_github("jbisanz/MicrobeR")

Next we can load it and check the version. We will also load plotly which allows for interactive visualizations. For publication figures I would avoid this and save as PDFs.

library(MicrobeR)
library(plotly)

We can start by loading the included vaginal microbiome dataset from Bisanz et al. doi: 10.1128/AEM.00780-15.

data("MicrobeRTestData")

Next we can inspect our metadata using the Nice.Table() command. Note that the data can be searched, filtered, sorted, and exported to common file formats. This is useful if using R in console or for markdown documents.

Nice.Table(MicrobeR.Demo.Metadata)

Lets explore global trends in our data with a PCoA using Bray Curtis. + ggtitle() has been added to manually change the title. Notice how an ADONIS test is automatically applied. This can be disabled with ADONIS=FALSE.

PCoA(METRIC="braycurtis", FEATURES = MicrobeR.Demo.SVtable, METADATA = MicrobeR.Demo.Metadata, COLOR = "Timepoint") + ggtitle("Exploratory PCoA of Bray Curtis Dissimilarities")

We can also make a 3D version of this. If this, or the figure above do not show, enable webgl in your browser.

PCoA3D(METRIC="braycurtis", FEATURES = MicrobeR.Demo.SVtable, METADATA = MicrobeR.Demo.Metadata, COLOR="Timepoint")

Before going further with visualization. Lets remove some of the noisy features from our dataset. In this case it will need to be in 2 samples with at least 100 reads across all samples.

conf.table<-Confidence.Filter(MicrobeR.Demo.SVtable, 2, 100)

Now lets create a taxa summarized version of our data for plotting purposes.

summarized.taxa<-Summarize.Taxa(conf.table, MicrobeR.Demo.Taxonomy)
print(paste0("We have ", nrow(summarized.taxa$Genus), " genera in dataset."))

Now that we have summarized taxa, we can plot these in a barplot. For barplots, >10 features generally creates extremely difficult to interpret figures, as such >10 is automatically added to a remainder, but this number can be manually altered. See instructions for more information. ggplotly has been used as a wrapper to aid in data exploration.

ggplotly(Microbiome.Barplot(FEATURES = summarized.taxa$Genus, METADATA = MicrobeR.Demo.Metadata, CATEGORY = "Timepoint"))

We can also look at genera for this analysis using a heat map plotting the 30 most abundant genera.

Microbiome.Heatmap(FEATURES=summarized.taxa$Genus, METADATA=MicrobeR.Demo.Metadata, NTOPFEATURES = 30, ROWCLUSTER = "abundance", CATEGORY="Timepoint")