knitr::opts_chunk$set( fig.path = "tools/readme/" )
devtools::load_all()
The adonisplus
package provides some utilities for permutational multivariate
analysis of variance, also known as adonis or PERMANOVA.
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("PennChopMicrobiomeProgram/adonisplus")
library(tidyverse)
To show an example, we'll use a built-in data set from the FARMM study by Tanes et al. Participants consumed one of three diets for the duration of the study: omnivore, vegan, or exclusive enteral nutrition (EEN). On days 5-8, participants were given a combination of three antibiotics to clear out bacteria in the gut. Additionally, participants underwent a gut purge on day 8. The fecal microbiome and metabolome were sampled throughout the study. Here, we will work with the microbiome data, which was generated by shotgun metagenomic sequencing.
Our data objects are a data frame of sample info, farmm_samples
, and a
matrix of Bray-Curtis distances between samples, farmm_bc
. The Bray-Curtis
distances are derived from differences in the abundance of the taxa that
comprise the fecal microbiome. We have one distance for each pair of samples.
Let's look at the data frame of samples.
farmm_samples
The first column, sample_id
provides a unique ID for each sample, which is
also used to label the samples in the distance matrix.
Before we get into any complicated stuff, let's run a simple example without
any repeated measurements. Let's ask if the diet groups were different, using
the last time point obtained for each participant. We'll make a new data frame,
farmm_final
, containing just the samples we need.
farmm_final <- farmm_samples %>% group_by(subject_id) %>% filter(study_day == max(study_day)) %>% ungroup()
First, we'll run a principal coordinates analysis (PCoA) of the final time point
using pcoaplus()
. To use this function, we pipe in the data frame of samples
as the first argument. The second argument is the distance matrix, farmm_bc
.
We need a third argument to specify which column of the data frame matches the
IDs in the distance matrix. The functions in this package will automatically
re-arrange the distance matrix to match the data frame, so you can use the same
distance matrix over and over as you filter and re-arrange the samples. Here,
we specify this column as sample_id
.
farmm_final %>% pcoaplus(distmat = farmm_bc, sample_id_var = sample_id)
We see that pcoaplus()
has returned a new data frame, with two new columns,
Axis.1
and Axis.2
. We could use this to go straight into ggplot()
, but
our data frame has a custom plot method that adds in some nice details, like
the percent variation for each axis. When calling plot()
, we override the
default mapping to color the samples by diet. We also add a few elements to the
ggplot
object, to set the color scale and change the theme.
farmm_final %>% pcoaplus(distmat = farmm_bc, sample_id_var = sample_id) %>% plot(color = diet) + scale_color_brewer(palette = "Set1") + theme_bw()
It looks like the microbiome of the EEN diet group is different from that in
the omnivore and vegan groups. Let's test for differences between the diet
groups using adonisplus()
. As with pcoaplus()
, we pipe in the data frame of
study info as the first argument, then provide the distance matrix as the
second argument. Our third argument is a formula to specify the statistical
model. The formula must have distmat
on the left-hand side, which allows us
to re-use the same formula with different distance matrices. As before, we add
an additional argument to let the function know which column of the data frame
corresponds to the IDs in the distance matrix.
farmm_final %>% adonisplus( distmat = farmm_bc, formula = distmat ~ diet, sample_id_var = sample_id)
Our result looks like the output of adonis()
from the vegan
package, but
it's been tidied up into a data frame. If you have your own results from
adonis()
, you can tidy them up yourself using the tidy.adonis()
function
from this package.
It looks like the diet groups are different, but we don't know which pairs are
different. My guess would be that EEN is different than omnivores and vegans,
and that omnivores aand vegans are not different from each other. To run the
pairwise comparisons, we'll use adonispost()
. The arguments to this function
are the same as adonisplus()
, but we add an additional argument, which
, to
specify the variable on which we want to carry out pairwise comparisons.
farmm_final %>% adonispost( distmat = farmm_bc, formula = distmat ~ diet, sample_id_var = sample_id, which = diet)
The results are as expected. So, far we could have done all of this work using
the functions in vegan
and ape
, without much difficulty. However, the
adonisplus
package really shines when we have experimental designs with
repeated measures.
Let's move beyond a single time point, to see how the microbiome changes over time for each diet. To get an overview, we'll re-generate Figure 2A from the paper.
farmm_samples %>% pcoaplus(distmat = farmm_bc, sample_id_var = "sample_id") %>% plot(color = study_day) + facet_grid(~ diet) + scale_color_viridis_c(direction = -1) + theme_bw()
The paper reports that "EEN led to a significant change in the microbiota composition within 3 days of the dietary phase relative to the vegan and omnivore group." The next sentence says, "The vegan and omnivore groups were not significantly different from each other until day 7, which marks the introduction of PEG." Let's check that out for ourselves.
To investigate both claims, we'll limit the data set to the pre-antibiotics
period, and call it farmm_preabx
.
farmm_preabx <- farmm_samples %>% filter(antibiotics %in% "pre")
Now, we'll run adonisplus()
on a sample set with repeated measures. We need
to provide two additional arguments. First, we give the variable within which
the measures are repeated, subject_id
in our case. Secondly, we need to tell
adonisplus()
how to randomly re-assign each variable in the formula. Here, we
tell it to randomly re-assign diet between subjects, and randomly shuffle the
study days within each subject.
farmm_preabx %>% adonisplus( distmat = farmm_bc, formula = distmat ~ diet * study_day, sample_id_var = sample_id, rep_meas_var = subject_id, shuffle = c(diet = "between", study_day = "within"))
If you run this function yourself, you'll notice that it takes a lot longer
than it takes to run adonis()
. As we randomly re-assign diets and time
points, we re-run the adonis()
function once for each re-assignment or
permutation.^[In fact, the adonis()
function has some built-in capability to
carry out restricted permutations, but because the built-in methods permute the
samples and not the study groups, the between-subject permutations don't work
if you don't have exactly the same number of samples per subject. Consequently,
our data set would not work with the built-in permutation methods in
adonis()
.] The functions used by adonisplus()
to carry out the restricted
permutations are named shuffle_within_groups()
and
shuffle_between_groups()
, if you want to use them elsewhere.
Having found a difference between the diets overall, we wish to know which
diet pairs are different. As before, we use adonispost()
and tell it to run
pairwise comparisons of diet by setting which = diet
.
farmm_preabx %>% adonispost( distmat = farmm_bc, formula = distmat ~ diet * study_day, sample_id_var = sample_id, rep_meas_var = subject_id, shuffle = c(diet = "between", study_day = "within"), which = diet)
In the pairwise comparisons, we find that the microbiome of the omnivore and vegan groups was not different during the pre-antibiotics period, based on Bray-Curtis distance. Conversely, the microbiome of the EEN group was different from both the omnivores and vegans.
Notice that, for all the comparisons above, we've been able to use the same
distance matrix, farmm_bc
. Each time we called a function using the
distances, the distance matrix was automatically filtered and re-arranged to
match the samples in our data frame. How convenient!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.