knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "tools/readme/")
devtools::load_all()

mirix

The goal of mirix is to calculate the Microbiome Response Index (MiRIx) for a given bacterial community. Often this will be used to predict the community's susceptibility to an antibiotic.

Installation

You can install the development version of mirix with devtools:

#install.packages("devtools")
devtools::install_github("PennChopMicrobiomeProgram/mirix")

Calculating antibiotic index values

Here, we'll load the mirix package, demonstrate some basic tasks, and give some pointers for its use. Although we could accomplish all our tasks in base R, the mirix package works very nicely with functions from the tidyverse. We'll import those functions now.

library(tidyverse)

Next, we'll load the mirix library.

library(mirix)

This package comes with a built-in data set, weiss2021_data, from a study of the gut microbiota in children with sepsis (PMID 33786436). The children in this study were exposed to a range of antibiotics, and fecal samples were collected across a series of time windows after diagnosis (study_window A-E). Bacterial communities in the fecal samples were profiled by sequencing the 16S rRNA gene, which serves as a fingerprint to identify bacteria in each sample. The study included samples from a set of healthy children in a similar age group for comparison. Here is a summary of the number of samples in each time window:

weiss2021_data %>%
  distinct(sample_id, study_group, study_window) %>%
  count(study_group, study_window) %>%
  knitr::kable()

for each sample collected, the data frame contains the relative abundance of all bacteria with a proportion of more than 0.001. Here is one of the sepsis samples, from subject 19019 in time window A. In this sample, the sequencing results indicated the following types of bacteria:

weiss2021_data %>%
  filter(sample_id %in% "Sepsis.19019.A") %>%
  select(lineage, proportion) %>%
  knitr::kable()

For each type of bacteria, the taxonomic assignment includes all the taxon names from the kingdom on down to the most specific taxon that could be determined from the sequence, in most cases the genus. Following the NCBI taxonomy browser, we call this result the lineage.

Our goal will be to use the lineage information to predict which bacteria in each sample would be susceptible or resistant to various antibiotics. For a given antibiotic, we construct an antibiotic-specific Mircrobiome Response Index by taking the log-ratio of the abundance for resistant organisms over that of susceptible organisms.

If the bacterial community is dominated by susceptible organisms, the index will be negative. Conversely, the index is positive if susceptible organisms constitute a minority. If an antibiotic has the predicted effect on a bacterial community, the index should increase after the antibiotic is introduced.

Let's compute the vancomycin response index for each sample in the study.

weiss2021_vanc <- weiss2021_data %>%
  group_by(sample_id, study_group, study_window) %>%
  summarise(vanc = mirix_vancomycin(proportion, lineage), .groups = "drop")

weiss2021_vanc %>%
  ggplot(aes(x=study_window, y = vanc)) +
  geom_boxplot() +
  facet_grid(~ study_group, scales = "free_x", space = "free_x") +
  labs(x = "Study window", y = "Vancomycin-specific index")

For healthy children, the medain value of the index is about 0.2, whereas it is roughly 0.85 across the samples from children with sepsis. Let's look further into how the index was calculated, and check out which bacteria were labeled as susceptible or resistant to vancomycin.

Susceptible and resistant bacteria

The mirix library offers some lower-level functions that show more details about how the index was calculated. For each antibiotic with an index function, there is an accompanying function to determine the susceptibility for each lineage. Let's list out the susceptibility for the lineages in the sample from subject 19019:

weiss2021_data %>%
  filter(sample_id %in% "Sepsis.19019.A") %>%
  mutate(susceptibility = antibiotic_susceptibility_vancomycin(lineage)) %>%
  select(lineage, susceptibility) %>%
  knitr::kable()

Looking down the list of taxa, we can see that three of the seven are from the Firmicutes phylum, in which the organisms are generally characterized by a Gram-positive cell wall structure. Vancomycin's mechanism of action is to disrupt construction of the cell wall in Gram-positive organisms, so we would generally predict that organisms in this phylum would be susceptible to the antibiotic. Likewise, the Actinobacteria are Gram-positive, so the final lineage is predicted to be susceptible. The three remaining lineages are Gram-negative organisms from the Proteobacteria, and are predicted to be resistant.

Now, if you've read the Wikipedia article on vancomycin, you might know that most species of Lactobacillus are naturally resistant to the drug, although they are Gram-positive bacteria in the Firmicutes phylum. Here, we don't have an assignment to the Lactobacillus genus, but instead to the Lactobacillaceae family, so the prediction stands as susceptible.

Databases for bacterial phenotypes and antibiotic susceptibility

The mirix package comes with two built-in databases. The taxon_phenotypes database contains information on Gram-stain and aerobic status for over 900 bacterial taxa. The taxa were selected to cover many taxa encountered in the human microbiome. It is here where we note that the Firmicutes are generally Gram-positive.

taxon_phenotypes %>%
  filter(taxon %in% "Firmicutes")

The second database taxon_susceptibility, contains information about bacterial susceptibility to specific antibiotics. It is here, for example, where we note that Lactobacillus species are typically resistant to vancomycin.

taxon_susceptibility %>%
  filter(taxon %in% "Lactobacillus")

If you need to add or modify information in these databases, you can copy the data frames to new variables, make the changes you'd like, and pass the new databases directly to the mirix_vancomycin() or antibiotic_susceptibility_vancomycin() functions.

Predicting taxon abundances for a given value of the index

In addition to calculating the antibiotics index for a given sample, we can predict what the abundances in a sample might look like at a given value of the index. Our approach is to re-balance the total abundances of resistant and susceptible bacteria in the sample, while preserving the relative abundances within the resistant taxa and within the susceptible taxa. Our method will also preserve the total abundance of each sample, in case the total abundance is not equal to 1. For taxa that are not annotated as either resistant or susceptible, we don't change the abundance at all.

We'll use a healthy control sample, Healthy.6, to demonstrate. First, we'll calculate the susceptibility to vancomycin for each lineage.

healthy6_data <- weiss2021_data %>%
  filter(sample_id %in% "Healthy.6") %>%
  mutate(taxon = word(lineage, -1)) %>%
  mutate(taxon = fct_reorder(taxon, proportion)) %>%
  mutate(susceptibility = antibiotic_susceptibility_vancomycin(lineage))

Above, we also made a shorter label for the taxa and sorted the values based on proportion in the sample, to aid in plotting. Here is a chart of the taxon abundances and their susceptibility.

healthy6_data %>%
  ggplot(aes(x = proportion, y = taxon, shape = susceptibility)) +
  geom_point() +
  scale_x_log10() +
  scale_shape_manual(values = c(1, 19), na.value = 3)

Most taxa in the sample are annotated as susceptible to vancomycin, including the most abundant taxon, Ruminococcaceae. One taxon, RF39, is not annotated. Only a few taxa are annotated as resistant to vancomycin, thus it's not surprising that the vancomycin index for the sample is negative.

healthy6_data %>%
  summarise(vanc = mirix_vancomycin(proportion, lineage))

How would we expect the proportions to change if the index increased to a positive value, say 0.5? We can use predict_abundance() to run the calculation.

healthy6_data %>%
  mutate(predicted = predict_abundance(0.5, proportion, susceptibility)) %>%
  rename(observed = proportion) %>%
  pivot_longer(
    c(observed, predicted), names_to = "method", values_to = "abundance") %>%
  ggplot(aes(x = abundance, y = taxon, color = method, shape = susceptibility)) +
  geom_point() +
  scale_x_log10() +
  scale_shape_manual(values = c(1, 19), na.value = 3) +
  scale_color_brewer(palette = "Paired")

Here, we can see that the abundances have increased for the resistant taxa, such as Bacteroides (near the top) and Enterobacteriaceae (at the bottom). For susceptible taxa, the abundances have decreased. For the taxon that's not annotated, RF39 (near the middle), the abundance has not changed at all.

To finish, let's re-calculate the vancomycin index for our predicted abundances, so we can verify that it has the expected value of 0.5.

healthy6_data %>%
  mutate(predicted = predict_abundance(0.5, proportion, susceptibility)) %>%
  summarise(vanc = mirix_vancomycin(predicted, lineage))


tuv292/abx_idx documentation built on Jan. 12, 2023, 12:48 a.m.