In krisrs1128/treelapse: Visualization for Tree-Structured Collections of Counts and Series

Introduction

library("knitr")
library("tidyverse")
library("phyloseq")
library("treelapse")

# minimal theme for ggplots
theme_set(theme_bw())
min_theme <- theme_update(
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.ticks = element_blank(),
  legend.title = element_text(size = 8),
  legend.text = element_text(size = 6),
  axis.text = element_text(size = 6),
  axis.title = element_text(size = 8),
  strip.background = element_blank(),
  strip.text = element_text(size = 8),
  legend.key = element_blank()
)
opts_chunk$set("fig.width" = 6.5, "fig.height" = 3.5)

data(abt)

In this vignette, we study the impact of antibiotics on microbial ecology, using data from [@dethlefsen2011incomplete]. Even non-scientists know that antibiotics kill bacteria -- here we will try to describe the effects of these time courses on microbial ecology in a little more precision. For example, how quickly do the bacteria die off, does the population return to the initial states, and if so how long does that take? Also, are certain microbes more / less resistant to antibiotics, and do we see any kinds of competition or shifts in the the community's position on the adaptive landscape?

We will answer these questions by studying time series of microbial abundances while the their host goes through a few separate periods of antibiotic treatment. The series can be arranged along the taxonomic tree; we can represent each node by either its sum or average across descendants. Instead of identifying each node with the full time series, we can alternatively conisder the vector of averages during the different periods (pre, during, and post-antibiotic) and see how these averages change across the different periods and across different subtrees.

Data Background

We give some preliminary views of the data, just to get some intuition about it. If you are familiar with this (type of) study, you can safely skip this section.

First, we look at how many samples there are, how many microbes were found, and what types of features were collected about the samples.

abt
mapping <- sample_data(abt)
summary(mapping)

Evidently, there are about fifty timepoints collected across three individuals. Each timepoint is associated with a condition: before, between, or after different antibiotic time courses (Cp, for ciprofloxacin). Unfortunately, the treatments were not given at the same time for each person, see the figure below.

ggplot(mapping) +
  geom_tile(aes(x = time, y = ind, fill = condition)) +
  scale_fill_brewer(palette = "Set3")

If we had time be very careful, we might want to find a way to align these series. Instead, we will just analyze these three subjects separately.

Turning over to the RSV (microbial) abundances, we notice that the counts are highly skewed. We're finding that the vast majority of microbes have very small abundnaces.

hist(asinh(as(otu_table(abt), "matrix")), main = "Raw RSV Counts")

This is actually pretty standard in microbiome data; we'll use the filtering and transformation functions in phyloseq to down to the ~300 most abundant taxa and work with asinh transformed counts[^The asinh transformation is like taking logs, but is less agressive at smaller values. This is evident from the representation, $asinh\left(x\right) = \log\left(x + \sqrt{1 + x^{t}}\right)].

abt <- abt %>%
  filter_taxa(function(x) sd(x) > 7.5, prune = TRUE) %>%
  transform_sample_counts(asinh)
abt
hist(as(otu_table(abt), "matrix"), main = "Processed RSV Counts")

These are relatively generic checks we would do with any microbiome data set. Next, we consider some preparatory work specific to treelapse.

Data Preparation

Before we can produce any of the treelapse figures, we will need to generate the following data.

An edgelist specifying the taxonomy of microbes in the sample.
A values dataset giving the

tip_values <- t(get_taxa(abt))

taxa <- tax_table(abt) %>%
  as("matrix")
taxa <- gsub("_1", "", taxa)
taxa <- gsub("_2", "", taxa)
taxa <- gsub("uncultured", "", taxa)
taxa[taxa == ""] <- NA

incertae_ix <- which(taxa == "Incertae Sedis", arr.ind = TRUE)
for (parent in c("Erysipelotrichi_Erysipelotrichales", "Lachnospiraceae", "Ruminococcaceae")) {
  cur_parent_ix <- which(taxa == parent, arr.ind = TRUE)
  cur_ix <- incertae_ix[incertae_ix[, "row"] %in% cur_parent_ix[, "row"], ]
  taxa[cur_ix] <- paste0("Incerate Sedis_", parent)
}

edges <- taxa_edgelist(taxa)

subjects <- unique(mapping$ind)
values <- list()
for (i in seq_along(subjects)) {
  cur_ix  <- mapping$ind == subjects[i]
  for (fun in c("sum", "mean")) {
    cur_fun <- get(sprintf("tree_%s", fun))
    cur_values <- tree_fun_multi(edges, tip_values[cur_ix,, drop = FALSE], cur_fun)

    values <- c(
      values,
      list(
        data.table(
          "subject" = subjects[i],
          "type" = fun,
          "time" = mapping$time[cur_ix][cur_values$row],
          cur_values
        )
      )
    )
  }
}

values <- rbindlist(values)

cur_subject <- "D"
time_data <- values %>%
  filter(subject == cur_subject, type == "sum") %>%
  select(time, unit, value) %>%
  arrange(unit, time)

conditions <- mapping %>%
  filter(ind == cur_subject) %>%
  select(time, condition) %>%
  unique()

Visualization

Tree Sums

display_opts <- list(
  "margin" = list("ts_right" = 30, "ts_left" = 35, "tree_right" = 15, "tree_left" = 15),
  "size_min" = 1,
  "size_max" = 10
)
timebox_tree(
  time_data,
  edges,
  display_opts = display_opts
)

treebox(
  time_data,
  edges,
  display_opts = display_opts
)

Tree Averages

time_data <- values %>%
  filter(subject == cur_subject, type == "mean") %>%
  select(time, unit, value) %>%
  arrange(unit)

display_opts$size_min <- 0.5
timebox_tree(
  time_data,
  edges,
  display_opts = display_opts
)

treebox(
  time_data,
  edges,
  display_opts = display_opts
)

Per-Condition Sankey

condition_values <- values %>%
  left_join(conditions) %>%
  group_by(subject, unit, type, condition) %>%
  summarise(value = mean(value))

sankey_data <- condition_values %>%
   filter(subject == cur_subject, type == "sum") %>%
   as.data.frame() %>%
   select(condition, unit, value)
colnames(sankey_data)[1] <- "group"
sankey_data$group <- gsub(" ", "", sankey_data$group)

doi_sankey(
  sankey_data,
  edges,
  width = 1000,
  display_opts = list(
    "leaf_width" = 5
  )
)

Interpretation

References

krisrs1128/treelapse documentation built on Jan. 3, 2020, 6:29 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

krisrs1128/treelapse
Visualization for Tree-Structured Collections of Counts and Series

In krisrs1128/treelapse: Visualization for Tree-Structured Collections of Counts and Series

Introduction

Data Background

Data Preparation

Visualization

Tree Sums

Tree Averages

Per-Condition Sankey

Interpretation

References

R Package Documentation

Browse R Packages

We want your feedback!

krisrs1128/treelapse Visualization for Tree-Structured Collections of Counts and Series

In krisrs1128/treelapse: Visualization for Tree-Structured Collections of Counts and Series

Introduction

Data Background

Data Preparation

Visualization

Tree Sums

Tree Averages

Per-Condition Sankey

Interpretation

References

R Package Documentation

Browse R Packages

We want your feedback!

krisrs1128/treelapse
Visualization for Tree-Structured Collections of Counts and Series