From the paper "Yeasts Dominate Soil Fungal Communities In Three Lowland Neotropical Rainforests"

The data can be downloaded at https://onlinelibrary.wiley.com/doi/abs/10.1111/1758-2229.12575

library(tidyverse)
library(here)

library(cowplot)
theme_set(theme_cowplot())

devtools::load_all()

doParallel::registerDoParallel(parallel::detectCores())

Downloading data is preferably automatic, but can be replaced with instructions on how to download the data yourself.

main_data_file <- here("local_data", "Dunthorn2017.txt")
if(!file.exists(main_data_file)) {
   download.file("https://sfamjournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2F1758-2229.12575&file=emi412575-sup-0001-suppinfo1.txt", main_data_file)

}

The important part - read and format the data so that rows are observations (biological samples) and columns are OTUs. Also gather basic information about the sampples (I call this "mapping" but maybe not a great name).

otus_raw <- read.csv(main_data_file, stringsAsFactors = FALSE, sep = "\t")
otu_tab <- otus_raw %>% select(B005_B006:T199_T200) %>% as.matrix() %>% t()
otu_tab <- otu_tab[rowSums(otu_tab) > 2000 & rownames(otu_tab) != "B005_B006",]

#Mapping hardcoded, could not bother parsing their file
mapping <- tibble(Sample = rownames(otu_tab)) %>% mutate(Origin = if_else(grepl("^B", Sample), "Panama", if_else(grepl("^L", Sample), "Costa Rica", "Ecuador")))

A custom plot function that wraps plot_mds and fixes some aesthetics that are reused - completely unnecessary, just convenient.

plot_mds_dunthorn <- function(base_mds, mapping, aligned_samples = NULL, show_paths = 0.5,  ...) {
  plot_mds(base_mds, mapping, aligned_samples = aligned_samples, color_aes = Origin, shape_aes = Origin, show_paths = show_paths, sample_point_alpha = 0.2) 
}

Basic MDS redoing the figure from the paper

set.seed(20190514)
base_mds <- otu_tab %>% metaMDS(trymax = 400, maxit = 500, trace = 0)
#base_mds$points
plot_mds_dunthorn(base_mds, mapping)

Running the Dirichlet-multinomial bootstrap

#Here, group_column needs to be filled, it is needed for the connectivity metric (not directly using right now, but it is of interest). group_column should be the main factor in the data.
sens <- mds_sensitivity_check(otu_tab, mapping, group_column = Origin, trace = 0)
plot.mds_sensitivity(sens,color_aes = Origin, shape_aes = Origin, sample_point_alpha = 0.2)
sens$connectivity_stats
sens$consistency_stats

Additional analyses that are not needed now

sens_jackknife <- mds_sensitivity_check(20, otu_tab, mapping, group_column = Origin, trace = 0, trymax = 50, sampling_func = sample_posterior_jackknife_observations)
plot(sens_jackknife,color_aes = Origin, shape_aes = Origin, sample_point_alpha = 0.2)
mds_samples_rarefy <- sample_posterior_rarefy(20, otu_tab) %>% 
  metaMDS_per_sample(try = 200, maxit = 500) %>% purrr::map(vegan::procrustes, X = base_mds)
plot_mds_dunthorn(base_mds, mapping, aligned_samples = mds_samples_rarefy)


cas-bioinf/metagenboot documentation built on Feb. 25, 2021, 3:58 p.m.