In jfq3/QsRutils: R Functions Useful for Community Ecology

Introduction

Several of the functions in QsRutils have to do with making tables of differential abundances of taxa among treatments. All but one of these functions begin with comp_ for comparisons. These should be executed in a certain order as given in the next table. The arguments for each function can be found in the documentation. The first two take experiment-level phyloseq objects with at least slots otu_table, sample_data , and tax_table.

Function | Purpose :---------------------- | ---------------------------------------------------------------- comp_prepare_phyloseq | Makes a copy of the phyloseq object with the OTU table transformed to percentages. For both objects, it then agglomerates taxa to a given rank, modifies the taxonomy tables to include only the given rank, and filters out OTUs below a given percentage of the total counts. The function returns a list of the two phyloseq objects. comp_prepare_otu_table | Applies a transformation to the OTU percentage table from comp_prepare_phyloseq and makes a vector of the treatment groups. The "grps" argument is the name of a factor in the sample_data table. The function returns a list of the OTU matrix as percentages, the OTU matrix as transformed data, and a vector of treatment groups. comp_means_sd | For each taxon, calculates the mean and standard deviation across all samples and returns the results as a data frame. comp_make_f_tests | For each taxon, performs one-way ANOVA for differences in relative abundance among treatments and returns the results as a data frame. The argument grps is a vector of treatment groups. comp_comparisons | For each taxon and each treatment, calculates means and standard deviations of relative abundances. Performs all pairwise t-tests and assigns letters indicating treatment differences. Returns the result as a data frame. The argument grps is a vector of treatment groups. comp_assemble | Assembles the results of the last three functions into a summary data frame.

A seventh function, make_comparisons, wraps all six of the other functions and returns a list of the the comparison table as a data frame, the OTU matrix as percentages, the OTU matrix as transformed data, and a vector of treatment groups. It is handy for comparing relative abundances of taxa for a given rank, but sometimes you may need to prepare the phyloseq object or even OTU tables in a different way. I give examples of three cases below.

Case 1 - Compare Realtive Abundances of Phyla

The is the simplest case, and is easily executed with the wrapper function. First, load the example data set. It is a phyloseq object with otu_table, sample_data, and tax_table.

suppressPackageStartupMessages(library(phyloseq))
suppressPackageStartupMessages(library(QsRutils))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(reshape2))
data("its.root")
its.root

Then use the function make_comparisons. The comparison table itself is the first item in the list returned.

comp.phyla <- make_comparisons(its.root, taxrank = "Phylum", grps = "Label", transformation = "sqrt_arc_sine", pc.filter = 0.01, p.adjust.method ="BH", pool.sd = TRUE)
comp <- comp.phyla$comparison.table
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp

You can check that the assumption of equal variances is met with the check_var function. It takes as arguments the OTU matrix of transformed data and the vector of treatment groups. It performs a Fligner-Killeen test for homogeneity of variances for each taxon and prints the results to the console.

check_var(comp.phyla$taxa.pc.transformed, comp.phyla$groups)

If the results indicate heterogeneity of variances, you can repeat make_comparisons with sd.pool = FALSE and/or try a different transformation. Three transformation functions are included in QsRutils: arc_sine, log_arc_sine, and sqrt_arc_sine, the last of which seems to work most often. Other standard functions, e.g. log, sqrt, etc. may be used, as could a custom function supplied by the user. transfomation = "none" is also accepted.

Case 2 - Compare Relative Abundances of Classes within a Phylum

In the example above, Ascomycota represents 77% of all of the counts. We might want to beak it down into classes. At first you might think we could simply subset its.root to the phylum Ascomycota and use the resulting phyloseq object:

asco <- subset_taxa(its.root, Phylum == "Ascomycota")
asco <- prune_taxa(taxa_sums(asco)>0, asco)
comp.asco <- make_comparisons(asco, taxrank = "Class", grps = "Label", transformation = "sqrt_arc_sine", pc.filter = 0.01, p.adjust.method ="BH", pool.sd = TRUE)
comp.asco$comparison.table

But these proportions are based on the number of Ascomycota sequences, not the total sequences in its.root. (They do not add up to exactly 100% of the Ascomycota sequences because we filtered out a few small classes.) If we want them to add up to something close to 77%, the mean percentage of Ascomycota in the first example above, we have to take a different approach. We have to prepare the phyoseq object "manually," and use the the other comp_ functions to generate the comparison table. The first part of the code below is copied from the comp_prepare_phyloseq function but modified to subset to Ascomycota and then agglomerate to class and remove ranks other than class. Because the transformation to percentages is performed before these steps, in the end result the percentages are based on the total number of sequences in the original phyloseq object. I made the following code somewhat generic by setting values for expt, taxrank, and pc.filter. Thus, for similar tasks, you can cut and paste the code, editing only the values in red as you wish.

expt <- its.root
expt.pc <- transform_sample_counts(expt, function(x) 100*(x/sum(x)))

# Subset to higher rank
expt.taxon <- subset_taxa(expt, Phylum == "Ascomycota")
expt.taxon.pc <- subset_taxa(expt.pc, Phylum == "Ascomycota")

# Agglomerate to desired rank.
taxrank <- "Class"
expt.taxon <- tax_glom(expt.taxon, taxrank)
expt.taxon.pc <- tax_glom(expt.taxon.pc, taxrank)

# Remove ranks other than taxrank
tax_table(expt.taxon) <- tax_table(expt.taxon)[ , taxrank]
tax_table(expt.taxon.pc) <- tax_table(expt.taxon.pc)[ , taxrank]

# Filter out taxa that are < 0.1% of the total sequences in expt.
pc.filter <- 0.001
n <- sum(taxa_sums(expt)) * pc.filter
expt.taxon <- prune_taxa(taxa_sums(expt.taxon)>=n, expt.taxon)
expt.taxon.pc <- prune_taxa(taxa_names(expt.taxon), expt.taxon.pc)

# Make comparisons
temp2 <- comp_prepare_otu_table(expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine")
temp3 <- comp_means_sd(temp2$otu.pc)
temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans, grps = temp2$groups, p.adjust.method = "BH",  pool.sd = TRUE)
comp <- comp_assemble(temp3, temp4, temp5)
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp

These percentages are based on the total counts in its.root and add up to approximately 77%, the proportion of Ascomycota sequences in the first case.

Case 3 - Working with a Pre-existing OTU Table

This is a rather trite example because the differences are so obvious, but suppose you had made a figure something like this:

data("plot_df")
ggplot(data = plot_df, aes(x = Treatment, y = Percent, fill = Family)) +
  stat_summary(fun.y = "mean", geom = "col", position = position_stack())

You went to some trouble getting it the way you wanted, and now a reviewer wants statistics on which families are different among treatments. plot_df contains percentages of families by treatment based on a fuller data set, but is in "long" format, suitable for ggplot. You can recover the OTU table and make a table comparing differential abundances using the procedures below. The function dcast in the package reshaape2 can convert "long" format into "wide" format, but if there are replications as we should have, it insists on aggregating the data in some way, e.g. by length (counts) or means. We can prevent this by adding another variable, such as a sequence number, with unique values for every row. Then we can use dcast to get the data in the wide format we want for our OTU table.

plot_df has the format:

head(plot_df)

To recover our data in wide format, we do:

plot_df$seq <- with(plot_df, ave(Percent, Treatment, Family, FUN = seq_along))
my.data <- dcast(seq + Treatment ~ Family, data = plot_df, value.var = "Percent")
head(my.data)

Here the ave function applies seq_along over all level combinations of Percent, Treatment, and Family. The result is that all values of seq + Treatment are unique, and thus dcast returns what we want without aggregating the data. We next save the Treatment column as a vector of our treatment groups and then remove the first two columns of my.data to get a matrix of the OTU table as percentages.

my.grps <- my.data$Treatment
my.pc <- my.data[ , -c(1,2)]

Next we transform the percents matrix using the sqrt_arc_sine function in this package.

my.pc.trans <- apply(my.pc, 2, sqrt_arc_sine)

Both my.pc and my.pc.trans have taxa in columns. For the remaining steps we need taxa to be in rows, so we have to transpose.

my.pc <- t(my.pc)
my.pc.trans <- t(my.pc.trans)

With these two matrices and the vector of groups, we are ready to make our comparison table.

temp3 <- comp_means_sd(my.pc)
temp4 <- comp_make_f_tests(my.pc.trans, grps = my.grps, var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = my.pc, otu.pc.trans = my.pc.trans, grps = my.grps, p.adjust.method = "BH",  pool.sd = TRUE)
comp <- comp_assemble(temp3, temp4, temp5)
comp <- comp[order(comp$mean, decreasing = TRUE), ]
comp