knitr::opts_chunk$set(tidy=FALSE,warning=FALSE,message=FALSE) Biocpkg <- function (pkg){ sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } CRANpkg <- function(pkg){ cran <- "https://CRAN.R-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) }
library(ggplot2) library(DT) library(tidyverse) library(phyloseq) library(ggtree) library(treeio) library(tidytree) library(MicrobiotaProcess)
MicrobiotaProcess
is an R package for analysis, visualization and biomarker discovery of microbial datasets. It supports the import of microbiome census data, calculating alpha index and provides functions to visualize rarefaction curves. Moreover, it also supports visualizing the abundance of taxonomy of samples. And It also provides functions to perform the PCA
, PCoA
and hierarchical cluster analysis. In addition, MicrobiotaProcess
also provides a method for the biomarker discovery of metagenome or other datasets.
MicrobiotaProcess
profilingMicrobiotaProcess
has an feature to import phylogenetic sequencing data from r Biocpkg("dada2")
[@callahan2016dada2] and qiime2[@bolyen2019qiime2] taxonomic clustering pipelines. The resulting object after import is r Biocpkg("phyloseq")
object[@Paul2013phyloseq]
# import data from dada2 pipeline. seqtabfile <- system.file("extdata", "seqtab.nochim.rds", package="MicrobiotaProcess") seqtab <- readRDS(seqtabfile) taxafile <- system.file("extdata", "taxa_tab.rds",package="MicrobiotaProcess") seqtab <- readRDS(seqtabfile) taxa <- readRDS(taxafile) # the seqtab and taxa are output of dada2 sampleda <- system.file("extdata", "mouse.time.dada2.txt", package="MicrobiotaProcess") ps_dada2 <- import_dada2(seqtab=seqtab, taxatab=taxa, sampleda=sampleda) ps_dada2 # import data from qiime2 pipeline otuqzafile <- system.file("extdata", "table.qza", package="MicrobiotaProcess") taxaqzafile <- system.file("extdata", "taxa.qza", package="MicrobiotaProcess") mapfile <- system.file("extdata", "metadata_qza.txt", package="MicrobiotaProcess") ps_qiime2 <- import_qiime2(otuqza=otuqzafile, taxaqza=taxaqzafile, mapfilename=mapfile) ps_qiime2
Rarefaction, based on sampling technique, was used to compensate for the effect of sample size on the number of units observed in a sample[@siegel2004rarefaction]. MicrobiotaProcess
provided ggrarecurve
to plot the curves, based on rrarefy
of r CRANpkg("vegan")
[@Jari2019vegan].
# for reproducibly random number set.seed(1024) p_rare <- ggrarecurve(obj=ps_dada2, indexNames=c("Observe","Chao1","ACE"), chunks=300) + theme(legend.spacing.y=unit(0.02,"cm"), legend.text=element_text(size=6)) p_rare
MicrobiotaProcess
provides get_alphaindex
to calculate alpha index and the ggbox
to visualize the result
alphaobj <- get_alphaindex(ps_dada2) head(as.data.frame(alphaobj)) p_alpha <- ggbox(alphaobj, geom="violin", factorNames="time") + scale_fill_manual(values=c("#2874C5", "#EABF00"))+ theme(strip.background = element_rect(colour=NA, fill="grey")) p_alpha
MicrobiotaProcess
presents the ggbartax
for the visualization of composition of microbial communities.
# relative abundance otubar <- ggbartax(obj=ps_dada2) + xlab(NULL) + ylab("relative abundance (%)") otubar
If you want to get the abundance of specific levels of class, You can use get_taxadf
then use ggbartax
to visualize.
phytax <- get_taxadf(obj=ps_dada2, taxlevel=2) phybar <- ggbartax(obj=phytax) + xlab(NULL) + ylab("relative abundance (%)") phybar
Moreover, the abundance (count) of taxonomy also can be visualized by setting count to TRUE, and the facet of plot can be showed by setting the facetNames.
phybar2 <- ggbartax(obj=phytax, facetNames="time", count=TRUE) + xlab(NULL) + ylab("abundance") phybar2 classtax <- get_taxadf(obj=ps_dada2, taxlevel=3) classbar <- ggbartax(obj=classtax, facetNames="time", count=FALSE) + xlab(NULL) + ylab("relative abundance (%)") classbar
PCA
(Principal component analysis) and PCoA
(Principal Coordinate Analysis) are general statistical procedures to compare groups of samples. And PCoA
can based on the phylogenetic or count-based distance metrics, such as Bray-Curtis
, Jaccard
, Unweighted-UniFrac
and weighted-UniFrac
. MicrobiotaProcess
presents the get_pca
, get_pcoa
and ggordpoint
for the analysis.
pcares <- get_pca(obj=ps_dada2, method="hellinger") # Visulizing the result pcaplot <- ggordpoint(obj=pcares, biplot=TRUE, speciesannot=TRUE, factorNames=c("time"), ellipse=TRUE) + scale_colour_manual(values=c("#2874C5", "#EABF00")) + scale_fill_manual(values=c("#2874C5", "#EABF00")) pcaplot pcoares <- get_pcoa(obj=ps_dada2, distmethod="euclidean", method="hellinger") # Visualizing the result pcoaplot <- ggordpoint(obj=pcoares, biplot=TRUE, speciesannot=TRUE, factorNames=c("time"), ellipse=TRUE) + scale_colour_manual(values=c("#2874C5", "#EABF00")) + scale_fill_manual(values=c("#2874C5", "#EABF00")) pcoaplot
beta diversity
metrics can assess the differences between microbial communities. It can be visualized with PCA
or PCoA
, this can also be visualized with hierarchical clustering. MicrobiotaProcess
also implements the analysis based on ggtree[@yu2017ggtree].
hcsample <- get_clust(obj=ps_dada2, distmethod="euclidean", method="hellinger", hclustmethod="average") # rectangular layout clustplot1 <- ggclust(obj=hcsample, layout = "rectangular", pointsize=1, fontsize=0, factorNames=c("time")) + scale_color_manual(values=c("#2874C5", "#EABF00")) + theme_tree2(legend.position="right", plot.title = element_text(face="bold", lineheight=25,hjust=0.5)) clustplot1 # circular layout clustplot2 <- ggclust(obj=hcsample, layout = "circular", pointsize=1, fontsize=2, factorNames=c("time")) + scale_color_manual(values=c("#2874C5", "#EABF00")) + theme(legend.position="right") clustplot2
If you have questions/issues, please visit github issue tracker.
Here is the output of sessionInfo() on the system on which this document was compiled:
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.