In mikemc/metacal: Bias estimation and calibration for metagenomics experiments

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This tutorial illustrates the process of estimating bias from control samples using this estimate to calibrate the relative abundances of the control taxa in samples whose true composition is unknown. The data we use is from the synthetic-community colonization experiment of Leopold and Busby (2020).

Leopold DR, Busby PE. 2020. Host Genotype and Colonist Arrival Order Jointly Govern Plant Microbiome Composition and Function. Curr Biol 30:3260-3266.e5. doi:10.1016/j.cub.2020.06.011

These authors performed a colonization experiment in which black cottonwood (Populus trichocarpa) was inoculated with the fungal rust pathogen Melampsora × columbiana and 8 species of foliar fungi. Fungal relative abundances were measured using ITS amplicon sequencing. To enable quantification of bias due to PCR bias and variation in ITS copy number, they also created a set of DNA mocks of these fungi, which they measured along with the experimental samples. We will use the mock samples to estimate the bias of the 9 focal species, and then calibrate these species' abundances in the full set of samples.

Setup

First, let's set up our R environment. This tutorial uses phyloseq objects and functions to store and manipulate the microbiome data, tidyverse packages for data manipulation and plotting, and some add-ons to ggplot2 from the ggbeeswarm and cowplot packages.

# Tools for microbiome data
library(phyloseq)
# Tools for general purpose data manipulation and plotting
library(tidyverse)
# ggplot helpers
library(ggbeeswarm)
library(cowplot)
theme_set(theme_cowplot())

library(metacal); packageVersion("metacal")

Next, we download the source data from Leopold and Busby (2020), which is available in Zenodo record 3872145. Set data_path to wherever you would like the data to be downloaded.

data_path <- here::here("vignettes", "data", "leopold2020host")
# To use a temporary directory:
# data_path <- file.path(tempdir(), "leopold2020host")
if (!dir.exists(data_path)) {
  dir.create(data_path, recursive = TRUE)
  download.file(
    "https://zenodo.org/record/3872145/files/dleopold/Populus_priorityEffects-v1.2.zip",
    file.path(data_path, "Populus_priorityEffects-v1.2.zip")
  )
  unzip(
    file.path(data_path, "Populus_priorityEffects-v1.2.zip"), 
    exdir = data_path
  )
}

The microbiome data is stored in a phyloseq object,

ps <- file.path(data_path, 
  "dleopold-Populus_priorityEffects-8594f7c/output/compiled/phy.rds") %>%
  readRDS %>%
  print

The (expected) actual compositions of the mock (control) samples can be computed from a csv file that contains the dilution factors used during sample construction. I will read in the csv file, convert it to a phyloseq object, and then compute the relative abundances of each species from the reciprocals of the dilution factors.

mock_actual <- file.path(data_path, 
  "dleopold-Populus_priorityEffects-8594f7c/data/MockCommunities.csv") %>%
  read.csv(row.names = 1) %>%
  select(-Sym4) %>%
  as("matrix") %>%
  otu_table(taxa_are_rows = FALSE) %>%
  transform_sample_counts(function(x) close_elts(1 / x))
mock_taxa <- taxa_names(mock_actual)

Note, I have converted the actual relative abundances to proportions, but the observed abundances in the ps object have the form of read counts,

otu_table(ps) %>% prune_taxa(mock_taxa, .) %>% corner

Let's take a look at the sample data and taxonomy table in the phyloseq object,

sam <- sample_data(ps) %>% as("data.frame") %>% as_tibble(rownames = "Sample")
tax <- tax_table(ps) %>% as("matrix") %>% as_tibble(rownames = "Taxon")
sam %>% glimpse
sam %>%
  count(Samp_type)

The control samples are the 10 samples with Samp_type == "Mock".

Now let's look at the taxonomy table,

tax %>% head
tax %>% filter(Taxon %in% mock_taxa) %>% select(Taxon, Family:Species)
ntaxa(ps)

There are 219 OTUs, of which 9 have been assigned to the mock taxa (the pathogen, here named "Melampsora", and the 8 foliar fungi). The names of the mock taxa in ps have already been set to exactly match those in mock_actual, which is a requirement for the steps that follow.

Inspect mock community measurements

To get a sense of how the mock mixtures were created, we can take a look at the expected proportions of the 9 taxa across the 10 samples,

mock_actual %>%
  psmelt %>%
  mutate(
    across(OTU, factor, levels = mock_taxa),
    across(Sample, factor, levels = sample_names(mock_actual)),
  ) %>%
  ggplot(aes(Sample, OTU, fill = Abundance)) +
  geom_tile() +
  scale_fill_viridis_c(trans = "log10", breaks = c(0.02, 0.05, 1e-1, 0.2)) +
  theme_cowplot(font_size = 11)

All taxa are in every sample, but in varying proportions that range from around 0.01 to 0.3.

To visualize and estimate bias, we'll work with a phyloseq object of the observed compositions subset to just the mock community samples and taxa,

ps.mock <- ps %>% 
  subset_samples(Samp_type == "Mock") %>%
  prune_taxa(mock_taxa, .)

Next, let's compare the observed to actual taxon proportions in the mocks. I'll first combine the observed and actual proportions in a data frame in "long" format,

props <- list(Actual = mock_actual, Observed = ps.mock) %>%
  map(transform_sample_counts, close_elts) %>%
  map_dfr(psmelt, .id = "Type") %>%
  select(Type, OTU, Sample, Abundance)

We can plot the observed vs. actual proportions with ggplot as follows,

brks <- c(0, 1e-3, 1e-2, 1e-1, 0.3)
axes <- list(
  scale_y_continuous(trans = scales::pseudo_log_trans(1e-3), 
    breaks = brks, labels = brks),
  scale_x_continuous(trans = scales::pseudo_log_trans(1e-3),
    breaks = brks, labels = brks),
  coord_fixed()
)

props %>%
  pivot_wider(names_from = Type, values_from = Abundance) %>%
  ggplot(aes(Actual, Observed, color = OTU, shape = Sample == "Mock.5")) +
  geom_abline(color = "darkgrey") +
  geom_quasirandom() +
  axes +
  scale_shape_manual(values = c(16, 1)) +
  scale_color_brewer(type = "qual", palette = 3)

Some notes about this plot:

Axes transformations: The multiplicative error created by bias can only be easily seen for low-proportion taxa if we use some form of log transformation. Options to consider include a log (or log10) transformation, logit transformation, and a pseudo-log transformation (which is linear at low values and logarithmic at higher values). Here I will use the pseudo-log transform since it can more easily handle any 0s that happen to be in the data than these other transforms. I set 1e-3 as the linear-to-log transition point; you should play around with this value with your own data. I saved the transformed axes in a variable axes to reuse in later plots.
Because the actual proportions are clustered at just 4 values, I used geom_quasirandom() from the ggbeeswarm package to jitter the points.
I have highlighted the sample Mock.5 as it was identified as an outlier by Leopold2020. It is the only sample where a taxon (Epicoccum) was undetected in a sample where it should have been present (the point at (0.01, 0)). Other than this non-detection, the observations of this sample seem in line with the others. I will include this sample in this tutorial, but we need to be aware that how we deal with this zero value may have undue influence on our bias estimate.

Estimate bias

Metacal provides a high-level interface to estimating bias, estimate_bias(), which expects two matrices or phyloseq objects with the observed and actual abundances for a set of control samples. To obtain sensible estimates using this function, we must first ensure that all taxa have an observed abundance that is greater than 0 in samples where their actual abundance is greater than 0. We will therefore add a pseudocount of 1 to the observed counts to remove 0s; other zero-replacement methods, such as cmultRepl() from zCompositions, would also work. The estimation method used by estimate_bias() also requires that taxa not supposed to be in a sample have an abundance of exactly 0, but the estimate_bias() function will automatically zero-out these observations for you.

sum(otu_table(ps.mock) == 0)
ps.mock.pseudo <- ps.mock %>%
  transform_sample_counts(function(x) x + 1)
all(otu_table(ps.mock.pseudo) > 0)

Now we can estimate bias using ps.mock.pseudo as our observed abundances. We use the boot = TRUE option to also generate bootstrap replicates that will give us a way to quantify the uncertainty in our estimate.

mc_fit <- estimate_bias(ps.mock.pseudo, mock_actual, boot = TRUE) %>% print
class(mc_fit)

estimate_bias() returns an "mc_bias_fit" object that can be interacted with through commonly used S3 methods such as print(), summary(), and coef(). The bias estimate can be extracted with coef(),

bias <- coef(mc_fit) %>% print

The summary() method uses the bootstrap replicates in the "mc_bias_fit" object to estimate the (geometric) standard error of the bias estimate,

mc_fit.summary <- summary(mc_fit)
print(mc_fit.summary)

producing a data frame with the estimate and standard error that can be accessed with

coef_tb <- mc_fit.summary$coefficients

Let's use this data frame to plot the bias estimate with two geometric standard errors,

coef_tb %>%
  mutate(taxon = fct_reorder(taxon, estimate)) %>%
  ggplot(aes(taxon, estimate, 
      ymin = estimate / gm_se^2, ymax = estimate * gm_se^2)) +
  geom_hline(yintercept = 1, color = "grey") +
  geom_pointrange() +
  scale_y_log10() +
  coord_flip()

This plot shows that these estimates are consistent but not identical with those in Leopold2020 Fig S2. The difference is due to our inclusion of the Mock.5 sample, which leads to a lower point estimate and larger standard error for Epicoccum.

Check model fit

The fitted values of the model can be accessed with fitted(); these are the proportions that we predict we'd observe for the control samples given the actual proportions and the estimated bias. The fitted() function returns a matrix with samples as rows and taxa as columns, which I'll convert into a phyloseq object.

observed.fitted <- fitted(mc_fit) %>% otu_table(taxa_are_rows = FALSE)

To plot the fitted values against the observed values, I'll add the fitted proportions to our earlier props data frame,

props.fitted <- bind_rows(
  props, 
  psmelt(observed.fitted) %>% add_column(Type = "Fitted")
)

Next, I'll use some tidyr manipulations and ggplot to plot the observed proportions vs. the actual and fitted predictions,

props.fitted %>%
  pivot_wider(names_from = Type, values_from = Abundance) %>%
  pivot_longer(c(Fitted, Actual), names_to = "Type", values_to = "Predicted") %>%
  ggplot(aes(Predicted, Observed, color = OTU)) +
  geom_abline(color = "darkgrey") +
  geom_point() +
  axes + 
  facet_wrap(~Type) +
  scale_color_brewer(type = "qual", palette = 3)

This plot shows that the fitted model is doing a good job of explaining the error across the full range of predicted proportions, though it is slightly underpredicting the observed proportions at low values.

Calibrate

Now we can use the estimated bias to calibrate the relative abundances of the mock taxa in all samples. Metacal provides the calibrate() function for this purpose, which takes as its arguments the observed compositions and the bias estimate returned by coef(mc_fit).

There are a few things to keep in mind about calibrate().

Because of the relative nature of microbiome measurement and the metacal bias estimate, it is only possible to calibrate the relative abundances of the taxa whose bias has been estimated. For this reason, calibrate() automatically drops taxa that are missing from the bias estimate.
A deterministic calibration method is used in which the observed abundances are simply divided by the bias and (optionally) normalized to sum to 1. Uncertainty in the bias estimate and in the observed abundances due to the noise inherent in microbiome measurement (such as random sampling of sequencing reads) is not accounted for.
An implication is that taxa whose observed abundance is 0 will have a calibrated abundance of 0. Whether and how you replace 0s prior to calibration may have a significant impact on downstream analyses.

Here I will use a pseudocount to make all the taxa abundances positive prior to calibration.

ps.pseudo <- transform_sample_counts(ps, function(x) x + 1)
ps.pseudo.cal <- calibrate(ps.pseudo, bias) %>% print

To verify that calibration worked as expected, let's compare the observed, calibrated, and actual taxon proportions in the 10 control samples. Again, I'll start by adding the calibrated proportions to the data frame with observed and actual proportions,

ps.pseudo.cal.mock <- ps.pseudo.cal %>% 
  subset_samples(Samp_type == "Mock")
props.cal <- bind_rows(
  props, 
  psmelt(ps.pseudo.cal.mock) %>% add_column(Type = "Calibrated")
) %>%
  select(Type, OTU, Sample, Abundance)

First, let's look at bar plots comparing the observed, calibrated, and actual compositions,

props.cal %>%
  mutate(
    across(Type, fct_rev),
    across(Sample, factor, levels = sample_names(mock_actual)),
    ) %>%
  rename(Proportion = Abundance) %>%
  ggplot(aes(Type, Proportion, fill = OTU)) +
  geom_col(width = 0.7) +
  scale_fill_brewer(type = "qual", palette = 3) +
  facet_wrap(~Sample, ncol = 5) +
  theme(
    axis.text.x = element_text(angle = 90, vjust = 0.5)
  )

We can see that the calibrated compositions are typically substantially closer to the actual proportions than the original observations. We can also plot the calibrated vs. actual (pseudo-log-transformed) proportions and compare to the earlier plot of observed vs. actual.

props.cal %>%
  pivot_wider(names_from = Type, values_from = Abundance) %>%
  pivot_longer(c(Observed, Calibrated), 
    names_to = "Type", values_to = "Proportion") %>%
  mutate(across(Type, fct_relevel, "Observed")) %>%
  ggplot(aes(Actual, Proportion, color = OTU)) +
  geom_abline(color = "darkgrey") +
  geom_quasirandom() +
  facet_wrap(~Type) +
  axes +
  scale_color_brewer(type = "qual", palette = 3)

This plot has essentially same information as the earlier plot examining the model fit, but seen from a different angle. We can see that calibration in this dataset clearly reduces the relative error in proportions across the tested range of actual proportions, but leads to estimates that are systematically too low at low proportions.