Using metagenomeFeatures to Retrieve Feature Data.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE
)
suppressPackageStartupMessages(library(metagenomeFeatures)) 
suppressPackageStartupMessages(library(phyloseq)) 

Vignette overview

r Biocpkg("metagenomeFeatures") and associated annotation packages[^1] can be used to obtain phylogentic trees and representative sequences for 16S rRNA marker gene sequence experimental datasets when the reads are clustered using closed reference clustering and a database with an MgDb annotation package. In this vignette we demostrate how to use metagenomeFeatures and Greengenes 16S rRNA database version 13.8 85\% OTUs to obtain a phylogenetic tree and representative sequences for Rousk et al. [-@rousk2010soil], a soil microbiome study. We then use the tree and sequence data obtained from the Greengenes MgDb to generate a phyloseq-class object for community analysis[^2].

Greengenes 13.8 85\% OTU MgDb

The gg 13.8 85% OTU (gg85) is provided as part of the r Biocpkg("metagenomeFeatures") package. gg85 is a MgDb-class object with the taxonomic heirarchy, sequence data, and phylogeny for the Greengenes database clustered at the 0.85 similarity threshold. After loading the r Biocpkg("metagenomeFeatures") package gg85 is loaded using get_gg13.8_85MgDb().[^3]

library(metagenomeFeatures)
gg85 <- get_gg13.8_85MgDb()

QIITA Dataset

QIITA is a public repository for sharing OTU tables and raw sequence data from 16S rRNA marker gene studies, however some of the datasets do not include representative sequences or phylogenetic trees. For this vignette we are using data from the previously mentioned soil microbiome study, https://qiita.ucsd.edu/study/description/94 [@rousk2010soil]. A BIOM and qiime mapping file for the study can be obtained from QIITA. A vector of Greengenes ids for the study cluster centers is included in this package for use in this vignette.

data(qiita_study_94_gg_ids)

Obtaining Representative Sequences and Phylogenetic Tree

A mgFeatures object is generated from gg85 (or any other MgDb-class object) using the annotateFeatures() function along with a set of database ids or keys. The resulting mgFeatures class object has the taxonomic heirarchy, phylogeny, and sequence data for the study OTUs.[^3]

soil_mgF <- annotateFeatures(gg85, qiita_study_94_gg_ids)

## Taxonomic heirarchy
soil_mgF

# Sequence data
mgF_seq(soil_mgF)

# Tree data 
mgF_tree(soil_mgF)

Using mgFeatures-class object with r Biocpkg("phyloseq")

Finally we use the phyogenetic tree from the mgFeatures-class object along with the OTU table for a beta diversity analysis. After loading the r Biocpkg("phyloseq") package we will load the abundance data from the QITTA biom file, using the r Biocpkg("phyloseq") command import_biom().
Next we will define the sample_data slot using a modified version of the mapping file 2301_mapping_file.txt obtained from QITTA, which only includes sample names and experimental variables pH, carbon/nitrogen ratio, total nitrogen, and total organic carbon. Future development of a common data structure for working with microbiome data that extends the SummarizeExperiment-class is underway. For this new class the mgFeatures-class will define the rowData slot.

data_dir <- system.file("extdata", package = "metagenomeFeatures")

## Load Biom
biom_file <-  file.path(data_dir, "229_otu_table.biom")
soil_ps <- phyloseq::import_biom(BIOMfilename = biom_file)

## Define sample data
sample_file <- file.path(data_dir, "229_sample_data.tsv")
sample_dat <- read.delim(sample_file)

## Rownames matching sample_names(), required for phyloseq sample_data slot
rownames(sample_dat) <- sample_dat$SampleID
sample_data(soil_ps) <- sample_dat

## Resulting phyloseq object
soil_ps

Rousk et al. [-@rousk2010soil] clustered the sequences at 97\% similarity therefore gg85 only contains a subset of the OTUs in the dataset.
To define the tree and sequence slot we need to subset soil_ps to only include OTUs in gg85. Samples with no OTUs in gg85 were also removed for our beta diversity analysis.

# Removing OTUs not in `gg85`
soil_tree <- mgF_tree(soil_mgF)
soil_ps_gg85 <- prune_taxa(taxa = soil_tree$tip.label, x = soil_ps)

# Removing samples with no OTUs in `gg85`
samples_to_keep <- sample_sums(soil_ps_gg85) != 0
soil_ps_gg85 <- prune_samples(samples = samples_to_keep, x = soil_ps_gg85)

Now soil_mgF can be used to to define our soil_ps object refseq and tree slots.

## Defining tree slot
phy_tree(physeq = soil_ps_gg85) <- soil_tree

## Defining seq slot
soil_ps_gg85@refseq <- mgF_seq(soil_mgF)

Now that the tree slot is defined we can perform microbial community analysis requiring a phylogenetic tree such as phylogenetic beta diversity. For the subset of features used here, pH is a primary driver for differences in beta diversity between the samples (Figure \@ref(fig:betaFig)). Nearly 30\% of the total variation in the data is explained by the first axis and sample pH is correlated with the first axis. This result is consistent with Rousk et al. [-@rousk2010soil] even though we only used a small subset of the study OTUs. The outlier sample (94.PH18) is an artifact of using a small subset of the original dataset, only one gg85 OTU is observed in the sample and that OTU was not observed in the other samples.

soil_ord <- ordinate(physeq = soil_ps_gg85, 
                     distance = "wunifrac", 
                     method = "PCoA")
plot_ordination(soil_ps_gg85, soil_ord, 
                color = "ph", 
                type="sample",
                label = "SampleID")

Session info {.unnumbered}

sessionInfo()

[^1]: Other databases are avilable as Bioconductor AnnotationData including; Greengenes Release 13.5: r Biocpkg("greengenesMgDb13.5"), Greengenes Release 13.8 99% OTUs,r Biocpkg("greengenes13.8_99MgDb") Ribosomal Database Project Release 11.5, r Biocpkg("ribosomaldatabaseproject11.5MgDb"), and Silva 128.1: r Biocpkg("silva128.1MgDb"). [^2]: The r Biocpkg("phyloseq") package defines the phyloseq-calss for working with 16S rRNA experimental data. [^3]: See the "metagenomeFeatures classes and methods." vignette, Vignettes("metagenomeFeatures classes and methods.") for more information on methods for working with MgDb-class and mgFeatures-class objects.

Reference {.unnumbered}



Try the metagenomeFeatures package in your browser

Any scripts or data that you put into this service are public.

metagenomeFeatures documentation built on Nov. 8, 2020, 5:18 p.m.