knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) suppressPackageStartupMessages(library(metagenomeFeatures))
The r Biocpkg("metagenomeFeatures")
package defines two classes, MgDb-class
for working with 16S rRNA databases, and mgFeatures-class
for marker-gene survey feature data.
Both classes are S4 object-oriented classes and contain slots for (1) the sequences themselves,
(2) taxonomic lineage,
and (3) a phylogenetic tree with the evolutionary relationship between features,
and (4) metadata.
This vignette demonstrates how to explore and subset MgDb-class
objects and create new mgFeatures-class
objects.
MgDb-class
ObjectThe MgDb-class
provides a consistent data structure for working with different 16S rRNA databases.
16S rRNA databases contain hundreds of thousands to millions of sequences, therefore an SQLite database is used to store the taxonomic and sequence data.
Using an SQLite database prevents the user from loading the full database into memory.
The database connection is managed using the r CRANpkg("RSQLite")
R package, and the taxonomic data are accessed using the r CRANpkg("dplyr")
and r CRANpkg("dbplyr")
packages.
The r Biocpkg("DECIPHER")
package is used to format the sequence data as an SQLite database and provides functions for working directly with the sequence data in the SQLite database.
The phylo-class
, from the r CRANpkg("APE")
R package, defines the tree slot.
The following examples utilize gg85
as the example MgDb-class
object included in the r Biocpkg("metagenomeFeatures")
package. MgDb-class
objects with full databases are in separate Bioconductor Annotation packages such as the r Biocpkg("greengenesMgDb13.5")
package.[^1]
After loading the r Biocpkg("metagenomeFeatures")
, the get_gg13.8_85MgDb()
is use to load the gg85
database. The taxonomy table is stored in a SQLite database and the taxonomy slot is an open connection to the database.
library(metagenomeFeatures) gg85 <- get_gg13.8_85MgDb()
MgDb-class
Accessor MethodsThe r Biocpkg("metagenomeFeatures")
package includes accessor methods for the four slots mgDb_meta()
, mgDb_tree()
, mgDb_taxa()
, mgDb_seq()
.
Additional accessor methods, taxa_*()
are provided for different taxonomy table elements.
These accessors are useful in determining appropriate parameters for the mgDb_select()
(See section \@ref(mgdb-select)).
The taxa_keytypes()
accessor returns a vector with taxonomy table column names and the taxa_columns()
return a vector of taxonomy relevant column names.
The taxa_keys()
method returns a data frame with all values in a database for specified keytype(s).
taxa_keytypes(gg85) taxa_columns(gg85) head(taxa_keys(gg85, keytype = c("Kingdom")))
MgDb-class
Select Methods {#mgdb-select}The MgDb-class
select method is used to retrieve database entries for a specified taxonomic group or id list and can return either taxonomic, sequences, phylogenetic tree, or a combinantion of the three.
The type
argument defines the type of information the database returns.
## Only returns taxa mgDb_select(gg85, type = "taxa", keys = c("Vibrionaceae", "Enterobacteriaceae"), keytype = "Family") ## Only returns seq mgDb_select(gg85, type = "seq", keys = c("Vibrionaceae", "Enterobacteriaceae"), keytype = "Family") ## Returns taxa, seq, and tree mgDb_select(gg85, type = "all", keys = c("Vibrionaceae", "Enterobacteriaceae"), keytype = "Family")
mgFeatures-class
ObjectmgFeatures-class
is used for storing and working with marker-gene survey feature data.
Similar to the MgDb-class
, the mgFeatures-class
has four slots for taxonomy, sequences, phylogenetic tree, and metadata.
As the number of features in a marker-gene survey dataset is significantly smaller than the number of sequences in a reference database,
mgFeatures
uses common Bioconductor data structures,
DataFrames
and DNAStringSets
to define the taxonomic and sequence slots.
Similar to MgDb-class
, a phylo
class object is used to define the tree slot.
Like the MgDb-class
there are accessor methods, mgF_*()
, for the fours slots.
New mgFeatures-class
objects can be defined directly or from a MgDb-class
object and user defined values for subsetting the database, including database sequence ids.[^2]
The two approaches are demonstrated below.
ve_select <- mgDb_select(gg85, type = "all", keys = c("Vibrionaceae", "Enterobacteriaceae"), keytype = "Family") mgFeatures(taxa = ve_select$taxa, tree = ve_select$tree, seq = ve_select$seq, metadata = list("gg85 subset Vibrionaceae and Enterobacteriaceae Family"))
annotateFeatures(gg85, query = ve_select$taxa$Keys)
sessionInfo()
[^1]: Other databases are avilable as Bioconductor AnnotationData including; Greengenes Release 13.5: r Biocpkg("greengenesMgDb13.5")
, Greengenes Release 13.8 99% OTUs,r Biocpkg("greengenes13.8_99MgDb")
Ribosomal Database Project Release 11.5, r Biocpkg("ribosomaldatabaseproject11.5MgDb")
, and Silva 128.1: r Biocpkg("silva128.1MgDb")
.
[^3]: See the "Using metagenomeFeatures to Retrieve Feature Data." vignette, Vignettes("Using metagenomeFeatures to Retrieve Feature Data.")
for using the mgFeatues-class
with marker-gene survey study data.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.