In microsud/chkMocks: Compare Mock Community Samples in Microbiome Sequencing Studies

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Look at your mocks...

Every microbiome sequencing experiment must have a positive control. However, how do we make use of these mock controls to guide our quality check is not easily available. A basic question to asks, Is the composition in experimental mock standards similar to theoretical expected composition?
We can visually compare the composition bar-plots and check for correlation between experimental and theoretical community composition.
chkMocks eases these basic comparisons.

Install

library(devtools) # make sure you have installed devtools
install_github("microsud/chkMocks")

library(chkMocks)
library(dplyr)
library(phyloseq)
library(patchwork)
library(ggplot2)

ZymoBiomics

Before starting the analysis you need:

Raw data processed with dada2 pipeline to get a phyloseq object with taxa_names as ASV seqs.
If the phyloseq object has samples and mocks, then subset to keep only the mocks.

Here, the example data are from Karstens L, Asquith M, Davin S, Fair D, Gregory WT, Wolfe AJ, Braun J, McWeeney S. 2019. Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments. mSystems 4:e00290-19.

ZymoExamplePseq

# check information
sample_data(ZymoExamplePseq)

A single function, checkZymoBiomics will do the following:

Take an input phyloseq object of mock communities with taxa_names as ASV seqs and use the ZymoTrainingSet to assign taxonomy. The ZymoTrainingSet contains only the full-length 16S rRNA gene sequences of the candidates in ZymoBIOMICS™ Microbial Community Standard. Catalog No. D6300. ASVs that are unrelated to ZymoTrainingSet are labelled unclassified. The function will return two phyloseq objects a) ASVs level and b) Agglomerated at Species level. The agglomerated species level data is used to check for correlation with theoretical composition.

output.dat <- checkZymoBiomics(ZymoExamplePseq,
                               mock_db = ZymoTrainingSet,
                               multithread= 2,
                               threshold = 60,
                               verbose = FALSE)

cortable <- output.dat$corrTable
colnames(cortable) <- c("MockSampleID", "Correlation2ZymoTheoretical", "MockSampleID_2" )

cortable

Get the agglomerated species level data.

ps_species <- output.dat$ps_species

ps_species

Check assignments

get_taxa_unique(output.dat$ps_species, "Species")

Plot composition

p <- plotZymoDefault(output.dat)
p

The above plot demonstrates how well/bad the experimental mocks behaved compared to theoretical composition.
The data from Karstens L, et al., 2019. mSystems started from cells->DNA extraction->pcr amplification->sequencing. At every step there is a possibility for bias. This is demonstrated by differences between the undiluted mock sample and Zymo theoretical composition. Additionally, diluted samples have several unknown, potential contaminants common to low-biomass samples.

The table one in this article provides the percent contaminants for each dilution. These values were D0 = 0.1, D1 = 0.1, D2 = 1.8, D3 = 4.5, D4 = 12.0, D5 = 27.9, D6 = 64.5, D7 = 55.8, D8 = 80.1.
We can check output of checkZymoBiomics for ASVs marked as 'Unknown' that are not matching any of the mock community taxa.

round(otu_table(output.dat$ps_species)["Unknown",],1)

Check how individual taxa were measured.

p <- plotZymoDefault(output.dat)

# using patchwork plot to extract first bar plot
p[[1]] + facet_wrap(~FeatureID) + 
  theme_minimal(base_size = 10) + 
  theme(legend.position = "none",
        strip.text = element_text(face="italic")) +
  ggplot2::scale_y_discrete(limits = rev(c("ZymoTheoretical","D8", "D6", "D7", 
                                       "D5", "D4", "D3", "D2", "D1", "D0")))

Looking at individual strain abundances indicates under counting of Staphylococcus aureus. It is also important to note the limitations of species level assignments for short-read length ASVs.

In the diluted mock samples, there are Unknown taxa i.e. those that are not of ZymoBiomics origin.
Check for their contribution.

sp.df <- phyloseq::psmelt(ps_species) %>%
  dplyr::filter(Species=="Unknown" & Sample !="ZymoTheoretical")
# we keep the order of dilution of samples 
sp.df$Sample <- factor(sp.df$Sample, levels = c("D8", "D6", "D7", "D5", 
                                                "D4", "D3", "D2", "D1", "D0"))

ggplot(sp.df, aes(Sample, Abundance)) + 
  geom_col() + theme_minimal() +
  ggplot2::ylab("Non-Zymo Abundance (%)")+
      ggplot2::xlab("Samples")

The lowest dilution, D8 has 80% non-ZymoBiomics taxa, so make sure you have negative controls and check the article by Karstens L, et al. Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments. mSystems 4:e00290-19. All codes from their analysis are available openly.
Note: The taxonomy of lactobacilli has been updated from L. fermentum to Limosilactobacillus Zheng J., Wittouck S., Salvetti E. et al.,(2020). A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerink 1901, and union of Lactobacillaceae and Leuconostocaceae.
Thanks to Giovanna Felis for bringing this to my notice on twitter.
The label for L. fermentum is still being used by ZymoBiomics and will keep it as it is for now.

A training set and phyloseq object with theoretical composition of ZymoBIOMICS® Gut Microbiome Standard Catalog No. D6331 is also made available.

data(ZymoBiomicsGutTrainingSet)
ZymoBiomicsGutTrainingSet

data(ZymoBiomicsGutPseq)
ZymoBiomicsGutPseq

Other independently developed tools that are:
ZymoResearch miqScore16SPublic by Michael Weinstein
QIIME2 q2-quality-control suggested by Yanxian Li
OCMS OCMS_zymoBIOMICS by Nick Ilott

Let me know if there are more tools that need to be mentioned here GitHub issues.

devtools::session_info()

Disclaimer: While we use ZymoBiomics data, we the developers of chkMocks are not associated with the manufacturers and this work should not be considered as an endorsement for the said product.

microsud/chkMocks documentation built on July 1, 2023, 9:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

microsud/chkMocks
Compare Mock Community Samples in Microbiome Sequencing Studies

In microsud/chkMocks: Compare Mock Community Samples in Microbiome Sequencing Studies

ZymoBiomics

R Package Documentation

Browse R Packages

We want your feedback!

microsud/chkMocks Compare Mock Community Samples in Microbiome Sequencing Studies

In microsud/chkMocks: Compare Mock Community Samples in Microbiome Sequencing Studies

ZymoBiomics

R Package Documentation

Browse R Packages

We want your feedback!

microsud/chkMocks
Compare Mock Community Samples in Microbiome Sequencing Studies