knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
by Pedro Martinez Arbizu
"This tutorial should be present here"
You have analyzed your metabarcoding results with dada2 (or other pipeline) and produced a OTU/ASV table. This means you have ASVs (or OTUs) as lines and localities as columns.
install library from Github
Windows users need Rtools installed to use devtools
library(devtools) install_github("pmartinezarbizu/dada2pp")
load library
library(dada2pp)
Load the example dataset deepMeio
.
This dataset contains the result of metabarcoding of deep-sea meiofauna.
The amplified fragment was V1V2 from 18s. The dada2 AVS were blasted against GenBank and
some columns with statistic results were added (columns 1:8).
The number of reads on 19 stations is recorded on columns 9:27. Each line represents one ASV.
data(deepMeio)
Inspect the dataset.
summary(deepMeio)
Before you move on, it would be useful to declare some information on gene
(fragment studied),
pip
(pipeline used) and pr
(project name). This will help to consistent names for files and graphs.
gene <- 'V1V2' pip <- 'dada2' pr <- 'DeepMeio'
Produce a graph of fragment lengths
mode <- as.integer(names(which(table(deepMeio$length)==max(table(deepMeio$length))))) counts <- max(table(deepMeio$length)) plot(table(deepMeio$length),ylab='number of ASVs',xlab='read length') text(mode,counts,labels=paste (mode, 'bp'),pos = 4)
Apply filters if needed: - 1.) remove ASVs shorter than (for instance) 300bp
deepMeio <- deepMeio[deepMeio$length>=300,]
Other filter could be to remove ASV with query coverage qcovs
is lower than (for instance) 90%.
Try this:
plot(table(deepMeio$qcovs)) deepMeio <- deepMeio[deepMeio$qcovs>=90,]
2.) remove samples with very low number of reads if needed.
First inspect number of reads per sample:
apply(deepMeio[,9:27],FUN=sum,MARGIN=2)
As there is no sample with extremely low number of reads we keep all.
Otherwise, to remove (for instance) sample loc1_2:
deepMeio <- deepMeio[,-which(names(deepMeio) == 'loc1_2')]
After applying all filters, check if any higher taxon group was left with 0 ASVs:
table(deepMeio$Group)
This is not the case here. Otherwise remove the groups with 0 ASVs.
To remove for instance group mesozoans
(if it would have 0 ASVs):
deepMeio <- deepMeio[-which(deepMeio$Group == 'mesozoans'),]
And then drop the level mesozoans
from Group
:
deepMeio$Group <- droplevels(deepMeio$Group)
Column Group
contains the higher taxon assignment for the AVS. How many are different groups are there?
unique(deepMeio$Group)
At first glance we discover 3 inconsistencies:
crustaceans
but also a separate group isopods
which are obviously also crustaceans.animals
eukaryotes
Assign the isopods to the group crustaceans
:
deepMeio[deepMeio$Group == 'isopods', 'Group'] <- 'crustaceans'
Inspect which taxa are in the group animals
:
unique(deepMeio[deepMeio$Group == 'animals', 'Species'])
Surprisingly Haplogonaria, Ascoparia, Nemertoderma and Proporus are all flatworms
.
Move these ASVs to flatworms
:
deepMeio[deepMeio$Species %in% c( "Haplogonaria_sp._'schillingi'_UJ2011", "Ascoparia_sp._UJ2011", "Nemertoderma_sp._SMNHUJ13330", "Proporus_brochii"), 'Group'] <- 'flatworms'
check results with these two commands:
unique(deepMeio[deepMeio$Group == 'animals', 'Species']) unique(deepMeio[deepMeio$Group == 'flatworms', 'Species'])
Inspect which taxa are in the group eukaryotes
:
unique(deepMeio[deepMeio$Group == 'eukaryotes', 'Species'])
As none of the taxa is metazoan meiofauna we leave the group eukaryotes
as it is.
Our study focus on free-living metazoan meiofauna. From the remaining groups there are some of them
which could be defined as contamination, others are benthic deep-sea organisms but not our
target metazoan meiofauna and only some groups are what we are looking for.
We define 3 groups contamination
, non_target
and target
:
contamination <- c('animals', 'eukaryotes', 'dinoflagellates', 'ascomycetes','golden_algae') non_target <- c('cercozoans', 'hemichordates', 'hydrozoans', 'bryozoans', 'protozoa', 'mesozoans', 'sea_cucumbers', 'tunicates', 'tusk_shells', 'solenogasters') target <- c('segmented_worms', 'crustaceans', 'ribbon_worms', 'nematodes', 'bivalves', 'flatworms', 'gastrotrichs', 'gastropods', 'loriciferans', 'spoonworms')
Use the function stats.tt
to inspect your results and reduce your dataset to target taxa.
- x: your data.frame
- reads: the subset of columns containing the reads
- by: the column containing the grouping factor
- cont: the contamination levels
- non_target: the non target levels
Everything that is not contamination or non_target is assumed to be target:
stat <- stats.tt(x=deepMeio,reads=deepMeio[,9:27],by=deepMeio$Group,cont=contamination,non_target=non_target) stat$stats
We can see form the output of stat$stats
that the target group has 10 taxa and retains
most of the reads (4.4 Million) and ASVs (n_otus = 836).
Inspect the stat object:
stat$cont stat$non_target stat$target
Write stats table to a file:
write.table(stat$stats,file=paste('target.stats',gene,pip,pr,'.csv',sep='_'),sep=';',row.names=TRUE,col.names=NA)
Extract your target ASVs to new dataset:
meio <- stat$target.table
remove non used taxa from Group
meio$Group <- droplevels(meio$Group)
Before we proceed with producing graphs it would be good to have consistent colors for the taxon groups in all the graphs.
The function pal2table()
creates a matching table of taxa to colors and symbols, so that you can
assign same colors and symbols to your taxa consistently in all your graphs.
Check ?pal2table()
for more options and control.
meiocol <- pal2table(unique(meio$Group),pal='c25')
inspect the created object:
meiocol
View the assigned colors and symbols:
plot(meiocol)
To use your color palette there is the function match2table
. See ?match2table()
for details.
t1 <- table(meio$Group) or <- order(t1,decreasing=TRUE) assign.col <- match2table(names(t1[or]),meiocol,'col') par(oma = c(4, 0, 0, 0),xpd=NA) barplot(t1[or], col=assign.col,las=2)
The function barp.table
will create the information needed to produce barplots.
Inspect the results in ..$table
. See ?barp.table()
for more details.
bp <- barp.table(meio[,9:27],meio$Group,sum) bp$table
Create the barplots (see ?plot.barp_table()
for details):
plot(bp, 'dada2sum', pal2table=meiocol, ncol.leg=5,oma1=7)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.