In ctanes/miscStats: Bonus material for the adonis function

knitr::opts_chunk$set(
  fig.path = "tools/readme/"
)

devtools::load_all()

adonisplus

The adonisplus package provides some utilities for permutational multivariate analysis of variance, also known as adonis or PERMANOVA.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("PennChopMicrobiomeProgram/adonisplus")

Using adonisplus

library(tidyverse)

To show an example, we'll use a built-in data set from the FARMM study by Tanes et al. Participants consumed one of three diets for the duration of the study: omnivore, vegan, or exclusive enteral nutrition (EEN). On days 5-8, participants were given a combination of three antibiotics to clear out bacteria in the gut. Additionally, participants underwent a gut purge on day 8. The fecal microbiome and metabolome were sampled throughout the study. Here, we will work with the microbiome data, which was generated by shotgun metagenomic sequencing.

Our data objects are a data frame of sample info, farmm_samples, and a matrix of Bray-Curtis distances between samples, farmm_bc. The Bray-Curtis distances are derived from differences in the abundance of the taxa that comprise the fecal microbiome. We have one distance for each pair of samples. Let's look at the data frame of samples.

farmm_samples

The first column, sample_id provides a unique ID for each sample, which is also used to label the samples in the distance matrix.

Before we get into any complicated stuff, let's run a simple example without any repeated measurements. Let's ask if the diet groups were different, using the last time point obtained for each participant. We'll make a new data frame, farmm_final, containing just the samples we need.

farmm_final <- farmm_samples %>%
  group_by(subject_id) %>%
  filter(study_day == max(study_day)) %>%
  ungroup()

First, we'll run a principal coordinates analysis (PCoA) of the final time point using pcoaplus(). To use this function, we pipe in the data frame of samples as the first argument. The second argument is the distance matrix, farmm_bc. We need a third argument to specify which column of the data frame matches the IDs in the distance matrix. The functions in this package will automatically re-arrange the distance matrix to match the data frame, so you can use the same distance matrix over and over as you filter and re-arrange the samples. Here, we specify this column as sample_id.

farmm_final %>%
  pcoaplus(distmat = farmm_bc, sample_id_var = sample_id)

We see that pcoaplus() has returned a new data frame, with two new columns, Axis.1 and Axis.2. We could use this to go straight into ggplot(), but our data frame has a custom plot method that adds in some nice details, like the percent variation for each axis. When calling plot(), we override the default mapping to color the samples by diet. We also add a few elements to the ggplot object, to set the color scale and change the theme.

farmm_final %>%
  pcoaplus(distmat = farmm_bc, sample_id_var = sample_id) %>%
  plot(color = diet) +
  scale_color_brewer(palette = "Set1") +
  theme_bw()

It looks like the microbiome of the EEN diet group is different from that in the omnivore and vegan groups. Let's test for differences between the diet groups using adonisplus(). As with pcoaplus(), we pipe in the data frame of study info as the first argument, then provide the distance matrix as the second argument. Our third argument is a formula to specify the statistical model. The formula must have distmat on the left-hand side, which allows us to re-use the same formula with different distance matrices. As before, we add an additional argument to let the function know which column of the data frame corresponds to the IDs in the distance matrix.

farmm_final %>%
  adonisplus(
    distmat = farmm_bc, formula = distmat ~ diet,
    sample_id_var = sample_id)

Our result looks like the output of adonis() from the vegan package, but it's been tidied up into a data frame. If you have your own results from adonis(), you can tidy them up yourself using the tidy.adonis() function from this package.

It looks like the diet groups are different, but we don't know which pairs are different. My guess would be that EEN is different than omnivores and vegans, and that omnivores aand vegans are not different from each other. To run the pairwise comparisons, we'll use adonispost(). The arguments to this function are the same as adonisplus(), but we add an additional argument, which, to specify the variable on which we want to carry out pairwise comparisons.

farmm_final %>%
  adonispost(
    distmat = farmm_bc, formula = distmat ~ diet,
    sample_id_var = sample_id, which = diet)

The results are as expected. So, far we could have done all of this work using the functions in vegan and ape, without much difficulty. However, the adonisplus package really shines when we have experimental designs with repeated measures.

Let's move beyond a single time point, to see how the microbiome changes over time for each diet. To get an overview, we'll re-generate Figure 2A from the paper.

farmm_samples %>%
  pcoaplus(distmat = farmm_bc, sample_id_var = "sample_id") %>%
  plot(color = study_day) +
  facet_grid(~ diet) +
  scale_color_viridis_c(direction = -1) +
  theme_bw()

The paper reports that "EEN led to a significant change in the microbiota composition within 3 days of the dietary phase relative to the vegan and omnivore group." The next sentence says, "The vegan and omnivore groups were not significantly different from each other until day 7, which marks the introduction of PEG." Let's check that out for ourselves.

To investigate both claims, we'll limit the data set to the pre-antibiotics period, and call it farmm_preabx.

farmm_preabx <- farmm_samples %>%
  filter(antibiotics %in% "pre")

Now, we'll run adonisplus() on a sample set with repeated measures. We need to provide two additional arguments. First, we give the variable within which the measures are repeated, subject_id in our case. Secondly, we need to tell adonisplus() how to randomly re-assign each variable in the formula. Here, we tell it to randomly re-assign diet between subjects, and randomly shuffle the study days within each subject.

farmm_preabx %>%
  adonisplus(
    distmat = farmm_bc, formula = distmat ~ diet * study_day,
    sample_id_var = sample_id, rep_meas_var = subject_id,
    shuffle = c(diet = "between", study_day = "within"))

If you run this function yourself, you'll notice that it takes a lot longer than it takes to run adonis(). As we randomly re-assign diets and time points, we re-run the adonis() function once for each re-assignment or permutation.^[In fact, the adonis() function has some built-in capability to carry out restricted permutations, but because the built-in methods permute the samples and not the study groups, the between-subject permutations don't work if you don't have exactly the same number of samples per subject. Consequently, our data set would not work with the built-in permutation methods in adonis().] The functions used by adonisplus() to carry out the restricted permutations are named shuffle_within_groups() and shuffle_between_groups(), if you want to use them elsewhere.

Having found a difference between the diets overall, we wish to know which diet pairs are different. As before, we use adonispost() and tell it to run pairwise comparisons of diet by setting which = diet.

farmm_preabx %>%
  adonispost(
    distmat = farmm_bc, formula = distmat ~ diet * study_day,
    sample_id_var = sample_id, rep_meas_var = subject_id,
    shuffle = c(diet = "between", study_day = "within"),
    which = diet)

In the pairwise comparisons, we find that the microbiome of the omnivore and vegan groups was not different during the pre-antibiotics period, based on Bray-Curtis distance. Conversely, the microbiome of the EEN group was different from both the omnivores and vegans.

Notice that, for all the comparisons above, we've been able to use the same distance matrix, farmm_bc. Each time we called a function using the distances, the distance matrix was automatically filtered and re-arranged to match the samples in our data frame. How convenient!

ctanes/miscStats documentation built on Nov. 24, 2024, 1:38 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com