knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Look at your mocks...
Every microbiome sequencing experiment must have a positive control. However, how do we make use of these mock controls to guide our quality check is not easily available. A basic question to asks, Is the composition in experimental mock standards similar to theoretical expected composition?
We can visually compare the composition bar-plots and check for correlation between experimental and theoretical community composition.
chkMocks
eases these basic comparisons.
Install
library(devtools) # make sure you have installed devtools install_github("microsud/chkMocks")
library(chkMocks) library(dplyr) library(phyloseq) library(patchwork) library(ggplot2)
Before starting the analysis you need:
taxa_names
as ASV seqs. Here, the example data are from Karstens L, Asquith M, Davin S, Fair D, Gregory WT, Wolfe AJ, Braun J, McWeeney S. 2019. Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments. mSystems 4:e00290-19.
ZymoExamplePseq
# check information sample_data(ZymoExamplePseq)
A single function, checkZymoBiomics
will do the following:
Take an input phyloseq object of mock communities with taxa_names
as ASV seqs and use the ZymoTrainingSet
to assign taxonomy. The ZymoTrainingSet
contains only the full-length 16S rRNA gene sequences of the candidates in ZymoBIOMICS™ Microbial Community Standard. Catalog No. D6300.
ASVs that are unrelated to ZymoTrainingSet
are labelled unclassified. The function will return two phyloseq objects a) ASVs level and b) Agglomerated at Species level. The agglomerated species level data is used to check for correlation with theoretical composition.
output.dat <- checkZymoBiomics(ZymoExamplePseq, mock_db = ZymoTrainingSet, multithread= 2, threshold = 60, verbose = FALSE) cortable <- output.dat$corrTable colnames(cortable) <- c("MockSampleID", "Correlation2ZymoTheoretical", "MockSampleID_2" ) cortable
Get the agglomerated species level data.
ps_species <- output.dat$ps_species ps_species
Check assignments
get_taxa_unique(output.dat$ps_species, "Species")
Plot composition
p <- plotZymoDefault(output.dat) p
The above plot demonstrates how well/bad the experimental mocks behaved compared to theoretical composition.
The data from Karstens L, et al., 2019. mSystems started from cells->DNA extraction->pcr amplification->sequencing. At every step there is a possibility for bias. This is demonstrated by differences between the undiluted mock sample and Zymo theoretical composition. Additionally, diluted samples have several unknown, potential contaminants common to low-biomass samples.
The table one in this article provides the percent contaminants for each dilution. These values were D0 = 0.1, D1 = 0.1, D2 = 1.8, D3 = 4.5, D4 = 12.0, D5 = 27.9, D6 = 64.5, D7 = 55.8, D8 = 80.1.
We can check output of checkZymoBiomics
for ASVs marked as 'Unknown' that are not matching any of the mock community taxa.
round(otu_table(output.dat$ps_species)["Unknown",],1)
Check how individual taxa were measured.
p <- plotZymoDefault(output.dat) # using patchwork plot to extract first bar plot p[[1]] + facet_wrap(~FeatureID) + theme_minimal(base_size = 10) + theme(legend.position = "none", strip.text = element_text(face="italic")) + ggplot2::scale_y_discrete(limits = rev(c("ZymoTheoretical","D8", "D6", "D7", "D5", "D4", "D3", "D2", "D1", "D0")))
Looking at individual strain abundances indicates under counting of Staphylococcus aureus. It is also important to note the limitations of species level assignments for short-read length ASVs.
In the diluted mock samples, there are Unknown
taxa i.e. those that are not of ZymoBiomics origin.
Check for their contribution.
sp.df <- phyloseq::psmelt(ps_species) %>% dplyr::filter(Species=="Unknown" & Sample !="ZymoTheoretical") # we keep the order of dilution of samples sp.df$Sample <- factor(sp.df$Sample, levels = c("D8", "D6", "D7", "D5", "D4", "D3", "D2", "D1", "D0")) ggplot(sp.df, aes(Sample, Abundance)) + geom_col() + theme_minimal() + ggplot2::ylab("Non-Zymo Abundance (%)")+ ggplot2::xlab("Samples")
The lowest dilution, D8 has 80% non-ZymoBiomics taxa, so make sure you have negative controls and check the article by Karstens L, et al. Controlling for contaminants in low-biomass 16S rRNA gene sequencing experiments. mSystems 4:e00290-19. All codes from their analysis are available openly.
Note: The taxonomy of lactobacilli has been updated from L. fermentum to Limosilactobacillus Zheng J., Wittouck S., Salvetti E. et al.,(2020). A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerink 1901, and union of Lactobacillaceae and Leuconostocaceae.
Thanks to Giovanna Felis for bringing this to my notice on twitter.
The label for L. fermentum is still being used by ZymoBiomics and will keep it as it is for now.
A training set and phyloseq object with theoretical composition of ZymoBIOMICS® Gut Microbiome Standard Catalog No. D6331 is also made available.
data(ZymoBiomicsGutTrainingSet)
ZymoBiomicsGutTrainingSet
data(ZymoBiomicsGutPseq)
ZymoBiomicsGutPseq
Other independently developed tools that are:
ZymoResearch miqScore16SPublic by Michael Weinstein
QIIME2 q2-quality-control suggested by Yanxian Li
OCMS OCMS_zymoBIOMICS by Nick Ilott
Let me know if there are more tools that need to be mentioned here GitHub issues.
devtools::session_info()
Disclaimer: While we use ZymoBiomics
data, we the developers of chkMocks
are not associated with the manufacturers and this work should not be considered as an endorsement for the said product.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.