In microbiota/amplicon: Statistics and visualization for microbiome data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Installation

The amplicon R package facilitates statistics and visualization of amplicon profiling data, in particular 16S and ITS alpha / beta taxonomy, taxonomic profiling.

# library(BiocManager)
# if (!requireNamespace("BiocManager", quietly=TRUE)){
#     install.packages("BiocManager")}
# if (!requireNamespace("BiocManager", quietly=TRUE)){
#     BiocManager::install("amplicon")}

Then load the package in R

library(amplicon)

Data

The amplicon package is similar with phyloseq data format. Mainly includes an OTU table (feature abundances), sample metadata (group, date, site, ...), taxonomy table (annotation of OTUs in 7 levels taxonomy: Kingdom, Phylum, Class, Order, Family, Genus, Species).

Showing the example data format

metadata

A table include sample information. Row name must be sample ID, and as following BarcodeSequence, group ID, date, location / site, et. al. .

data(metadata)
head(metadata, n = 3)

OTU table

A table include reads counts of each OTU in each sample. Row name is OTU ID, and Column name is sample ID.

data(otutab)
dim(otutab)
otutab[1:3, 1:3]

taxonomy

A table include 7 levels of taxonomy of each OTU.

data(taxonomy)
head(taxonomy, n = 3)

Alpha Diversity

Setting the basic parameter

# colnames of group ID in metadata
# 设置实验分组列名
group = "Group"
# Output figure width and height  
# Letter纸图片尺寸为单栏89 mm，双栏183 mm，页面最宽为247 mm
# 推荐比例16：10，即半版89 mm x 56 mm; 183 mm x 114 mm
width = 89
height = 59

Alpha diversity boxplot

Plotting each figure mainly include three step:

Reading data and viewing format;
Plotting and visualization;
Saving figure.

# Data reading
# metadata = read.table("metadata.txt", header=T, row.names=1, sep="\t", comment.char="")
# head(metadata, n = 3)
# alpha_div = read.table("alpha_div.txt", header=T, row.names=1, sep="\t", comment.char="")
# head(alpha_div, n = 3)

# Alpha barplot
(p = alpha_barplot(alpha_div, metadata, index = "richness", groupID = "Group"))

# capitalize
library(Hmisc)
colnames(alpha_div) = capitalize(colnames(alpha_div))
colnames(alpha_div)

# Plotting alpha diversity richness boxPlotting and statistics
(p = alpha_boxplot(alpha_div, index = colnames(alpha_div)[9], metadata, groupID = group))

# Saving figure
# 保存图片，大家可以修改图片名称和位置，长宽单位为毫米
ggsave("alpha_boxplot.pdf", p, width = width, height = height, units = "mm")
p1 = p

# 尝试探索不同的多样性各类
colnames(alpha_div)
alpha_boxplot(alpha_div, index = colnames(alpha_div)[12], metadata, groupID = group)

# 尝试不同的分组方式
colnames(metadata)
alpha_boxplot(alpha_div, index = colnames(alpha_div)[9], metadata, groupID = "Site")

Alpha rarefraction curve

# Data reading
alpha_rare = read.table("alpha_rare.txt", header=T, row.names=1, sep="\t", comment.char="")
alpha_rare[1:3, 1:3]

# Plotting alpha rarefraction curve in group mean with standard error
(p = alpha_rare_curve(alpha_rare, metadata, groupID = group))

# Saving figure
ggsave("alpha_rarefraction_curve.pdf", p, width = width, height = height, units = "mm")
p2 = p

Beta diversity

principal coordinat analysis (PCoA)

# Data reading
beta_bray_curtis = read.table("beta_bray_curtis.txt", header=T, row.names=1, sep="\t", comment.char="")
beta_bray_curtis[1:3, 1:3]

# Plotting alpha rarefraction curve in group mean with standard error
(p = beta_pcoa(beta_bray_curtis, metadata, groupID = group))

# Saving figure
ggsave("beta_pcoa_bray.pdf", p, width = width, height = height, units = "mm")
p3 = p

Constrained PCoA

# Data reading
# otutab = read.table("otutab.txt", header=T, row.names=1, sep="\t", comment.char="")
# otutab[1:3, 1:3]

# Plotting alpha rarefraction curve in group mean with standard error
(p = beta_cpcoa(otutab, metadata, dis="bray", groupID = group))

# Saving figure
ggsave("beta_cpcoa_bray.pdf", p, width = width, height = height, units = "mm")
p4 = p

BetaDiv

# Rscript ${db}/script/otutab_rare.R --input otutab.txt \
#       --depth 0 --seed 1 \
#       --normalize otutab_rare.txt \
#       --output alpha/vegan.txt

# 可用的距离类型
(dist_methods = unlist(phyloseq::distanceMethodList))
dist = "bray" # 选择 bray 距离
# 排序方法: DCA, CCA, RDA, NMDS, MDS, PCoA, PCA, LDA
method = "LDA"

# 输入OTU表、样本元数据、进化树；指定距离编号，分组列和排序方法，以及P阈值和统计方法
result = BetaDiv(otu = otutab_rare, map = metadata, tree = tree,
                 group = "Group", dist = dist, method = method,
                 Micromet = "adonis", pvalue.cutoff = 0.05)


# 提取返回结果
# 提取图片
(p1 = result[[1]])
# 提取作图数据
data1 = result[[2]]
# 样品添加标签的图片
# p2 = result[[3]]
# 提取两两比对结果
data2 = result[[4]]
# 提取整体比较结果，与图中一致
# data3 = result[[5]]


# 保存结果
prefix = paste(dist, method, sep = "_")
ggsave(paste(prefix,"plot.pdf",sep = "_"), p1, 
       width = width*2, height = height*2, units = "mm")
write.table(data1, paste(prefix,"data.txt",sep = "_"), quote = F, sep = "\t")
write.table(data2, paste(prefix,"stat.txt",sep = "_"), quote = F, sep = "\t")

Combo plots to published-ready figure

library(cowplot)
(p0 = plot_grid(p1, p2, p3, p4, labels = c("A", "B", "C", "D")))
ggsave("diversity.pdf", p0, width = width * 2, height = height * 2, units = "mm")