knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This package is based on the approach described in:
Steven, B., Hesse, C., Soghigian, J., Gallegos-Graves, L. V. & Dunbar, J. Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant. Appl Env Microbiol 83, e00696-17-11 (2017).
However, there are a few key differences in this package compared to Steven et al.'s approach:
Steven et al. reported that, for a community with the "medium" ribosomal amplification model, partial sampling of the community resulted in a large spread of rRNA:rDNA ratios > 1, with the spread decreasing as the sampling coverage increased.
Let's try to replicate this result. We'll run 100 independent simulations for each of 1x, 10x, 100x and 1000x the population richness, which will be 5000 in all simulations. The sampling depth will be the same for both rDNA and rRNA. This will take a few minutes to run.
library(rdrsimulate) library(purrr) library(knitr) library(dplyr) library(ggplot2) set.seed(1) simulate_partial_sampling <- function(i, depth) { # Generate a community of 5,000 OTUs comm <- generate_community() # Store the true ratios true_ratios <- comm$ratio # Simulate partial sampling to the specified depth sampled_comm <- sample_community(comm, nDNA = depth, nRNA = depth) # Return ratios sampled_comm <- sampled_comm %>% mutate(i = i) %>% mutate(depth = depth) %>% mutate(detected = rDNA_abund > 0) %>% mutate(true_ratio = true_ratios) %>% select(i, depth, ratio, detected, true_ratio) sampled_comm } # Simulate partial sampling 100 times for each depth results <- map2_dfr( rep(1:100, 4), rep(c(5000, 50000, 500000, 5000000), each = 100), ~ simulate_partial_sampling(.x, .y) ) # Plot the results results %>% ggplot(aes(x = ratio)) + geom_histogram(bins = 100) + facet_wrap( ~ depth) # Statistical summary for each depth results %>% filter(! is.na(ratio)) %>% group_by(depth) %>% summarise( mean(ratio), mean(true_ratio), median(ratio), median(true_ratio), max(ratio), max(true_ratio), sum(detected) / 5000 ) %>% kable
These results are very similar to those reported by Steven et al.: the maximum ratio peaks at 50,000 reads, then decreases with decreasing coverage. It's surprising how much of a difference in the spread of ratios is observed even with complete coverage - this demonstrates just how sensitive the accuracy of the rRNA transcript count is to sequencing depth.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.