library("knitr") library("tidyverse") library("phyloseq") library("treelapse") # minimal theme for ggplots theme_set(theme_bw()) min_theme <- theme_update( panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), legend.title = element_text(size = 8), legend.text = element_text(size = 6), axis.text = element_text(size = 6), axis.title = element_text(size = 8), strip.background = element_blank(), strip.text = element_text(size = 8), legend.key = element_blank() ) opts_chunk$set("fig.width" = 6.5, "fig.height" = 3.5) data(abt)
In this vignette, we study the impact of antibiotics on microbial ecology, using data from [@dethlefsen2011incomplete]. Even non-scientists know that antibiotics kill bacteria -- here we will try to describe the effects of these time courses on microbial ecology in a little more precision. For example, how quickly do the bacteria die off, does the population return to the initial states, and if so how long does that take? Also, are certain microbes more / less resistant to antibiotics, and do we see any kinds of competition or shifts in the the community's position on the adaptive landscape?
We will answer these questions by studying time series of microbial abundances while the their host goes through a few separate periods of antibiotic treatment. The series can be arranged along the taxonomic tree; we can represent each node by either its sum or average across descendants. Instead of identifying each node with the full time series, we can alternatively conisder the vector of averages during the different periods (pre, during, and post-antibiotic) and see how these averages change across the different periods and across different subtrees.
We give some preliminary views of the data, just to get some intuition about it. If you are familiar with this (type of) study, you can safely skip this section.
First, we look at how many samples there are, how many microbes were found, and what types of features were collected about the samples.
abt mapping <- sample_data(abt) summary(mapping)
Evidently, there are about fifty timepoints collected across three individuals. Each timepoint is associated with a condition: before, between, or after different antibiotic time courses (Cp, for ciprofloxacin). Unfortunately, the treatments were not given at the same time for each person, see the figure below.
ggplot(mapping) + geom_tile(aes(x = time, y = ind, fill = condition)) + scale_fill_brewer(palette = "Set3")
If we had time be very careful, we might want to find a way to align these series. Instead, we will just analyze these three subjects separately.
Turning over to the RSV (microbial) abundances, we notice that the counts are highly skewed. We're finding that the vast majority of microbes have very small abundnaces.
hist(asinh(as(otu_table(abt), "matrix")), main = "Raw RSV Counts")
This is actually pretty standard in microbiome data; we'll use the filtering and
transformation functions in phyloseq
to down to the ~300 most abundant taxa
and work with asinh
transformed counts[^The asinh transformation is like
taking logs, but is less agressive at smaller values. This is evident from the
representation, $asinh\left(x\right) = \log\left(x + \sqrt{1 + x^{t}}\right)].
abt <- abt %>% filter_taxa(function(x) sd(x) > 7.5, prune = TRUE) %>% transform_sample_counts(asinh) abt hist(as(otu_table(abt), "matrix"), main = "Processed RSV Counts")
These are relatively generic checks we would do with any microbiome data set. Next, we consider some preparatory work specific to treelapse.
Before we can produce any of the treelapse
figures, we will need to generate
the following data.
values
dataset giving thetip_values <- t(get_taxa(abt))
taxa <- tax_table(abt) %>% as("matrix") taxa <- gsub("_1", "", taxa) taxa <- gsub("_2", "", taxa) taxa <- gsub("uncultured", "", taxa) taxa[taxa == ""] <- NA incertae_ix <- which(taxa == "Incertae Sedis", arr.ind = TRUE) for (parent in c("Erysipelotrichi_Erysipelotrichales", "Lachnospiraceae", "Ruminococcaceae")) { cur_parent_ix <- which(taxa == parent, arr.ind = TRUE) cur_ix <- incertae_ix[incertae_ix[, "row"] %in% cur_parent_ix[, "row"], ] taxa[cur_ix] <- paste0("Incerate Sedis_", parent) } edges <- taxa_edgelist(taxa)
subjects <- unique(mapping$ind) values <- list() for (i in seq_along(subjects)) { cur_ix <- mapping$ind == subjects[i] for (fun in c("sum", "mean")) { cur_fun <- get(sprintf("tree_%s", fun)) cur_values <- tree_fun_multi(edges, tip_values[cur_ix,, drop = FALSE], cur_fun) values <- c( values, list( data.table( "subject" = subjects[i], "type" = fun, "time" = mapping$time[cur_ix][cur_values$row], cur_values ) ) ) } } values <- rbindlist(values)
cur_subject <- "D" time_data <- values %>% filter(subject == cur_subject, type == "sum") %>% select(time, unit, value) %>% arrange(unit, time)
conditions <- mapping %>% filter(ind == cur_subject) %>% select(time, condition) %>% unique()
display_opts <- list( "margin" = list("ts_right" = 30, "ts_left" = 35, "tree_right" = 15, "tree_left" = 15), "size_min" = 1, "size_max" = 10 ) timebox_tree( time_data, edges, display_opts = display_opts )
treebox( time_data, edges, display_opts = display_opts )
time_data <- values %>% filter(subject == cur_subject, type == "mean") %>% select(time, unit, value) %>% arrange(unit)
display_opts$size_min <- 0.5 timebox_tree( time_data, edges, display_opts = display_opts )
treebox( time_data, edges, display_opts = display_opts )
condition_values <- values %>% left_join(conditions) %>% group_by(subject, unit, type, condition) %>% summarise(value = mean(value)) sankey_data <- condition_values %>% filter(subject == cur_subject, type == "sum") %>% as.data.frame() %>% select(condition, unit, value) colnames(sankey_data)[1] <- "group" sankey_data$group <- gsub(" ", "", sankey_data$group)
doi_sankey( sankey_data, edges, width = 1000, display_opts = list( "leaf_width" = 5 ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.