In Gibbons-Lab/mbtools: Provides helper functions for microbiome preprocessing and analysis

Reference counting

One basic anaylisis we want to perform is to count the abundance of some references in our data. Here reference is a broad term and may refer to transcripts, genomes, or amplicon sequences contained in a reference database.

For this mbtools implements a general purpose expectation maximization (EM) counter. It is implemented in C++ to perform fast and usually reading the alignment file takes longer than the actual analysis.

library(mbtools)

Alignment

Let's start by aligning our short read data to a reference database of 10 microbial genomes.

fi <- system.file("extdata/shotgun", package = "mbtools") %>%
      find_read_files()
ref <- system.file("extdata/genomes/zymo_mock.fna.gz",
                   package = "mbtools")

alns <- align_short_reads(
    fi,
    threads = 3,
    reference = ref,
    use_existing = FALSE)

EM counting

Counting is yet again a workflow and requires an alignment artifact and a configuration.

config <- config_count(
    reference = system.file("extdata/genomes/zymo_mock.fna.gz",
                            package = "mbtools"),
    threads = 1,
    weights = TRUE
)

config

And we can proceed to counting:

cn <- count_references(alns, config)

This creates a count artifact which contains the counts.

cn$counts

Note that there is a NA reference. This is the number of reads likely coming from a sequence not contained in the reference.

cn$counts[is.na(reference)]

In each sample we have 10 microbes with the same abundance.

ggplot(cn$counts, aes(x=counts, y=reference)) +
    geom_point() + xlim(0, 1000) +
    facet_wrap(~ sample) + theme_minimal()

Looks ok, but there is a lot of noise since we have very low depth.

ggplot(cn$counts, aes(x = counts)) + geom_histogram(bins = 8)

We could also compare that with a naive counting method that just uses the best alignment score. In our case that will perform equally since we have little multi-mapping here.

config$method <- "naive"
cn2 <- count_references(alns, config)

counts <- cn$counts[cn2$counts, on = c("sample", "reference", "effective_length")]

ggplot(counts, aes(x = counts, y = i.counts)) +
    geom_point() + geom_smooth(method = "lm") +
    labs(x = "em", y = "naive") + theme_classic()

As we see they both give the same results for that simple case.

Gibbons-Lab/mbtools documentation built on Jan. 28, 2024, 11:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Gibbons-Lab/mbtools
Provides helper functions for microbiome preprocessing and analysis

In Gibbons-Lab/mbtools: Provides helper functions for microbiome preprocessing and analysis

Reference counting

Alignment

EM counting

R Package Documentation

Browse R Packages

We want your feedback!

Gibbons-Lab/mbtools Provides helper functions for microbiome preprocessing and analysis

In Gibbons-Lab/mbtools: Provides helper functions for microbiome preprocessing and analysis

Reference counting

Alignment

EM counting

R Package Documentation

Browse R Packages

We want your feedback!

Gibbons-Lab/mbtools
Provides helper functions for microbiome preprocessing and analysis