knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Working with dada2pp

by Pedro Martinez Arbizu

"This tutorial should be present here"

Starting point

You have analyzed your metabarcoding results with dada2 (or other pipeline) and produced a OTU/ASV table. This means you have ASVs (or OTUs) as lines and localities as columns.

Getting ready

install library from Github

Windows users need Rtools installed to use devtools

library(devtools)
install_github("pmartinezarbizu/dada2pp")

load library

library(dada2pp)

Load the example dataset deepMeio. This dataset contains the result of metabarcoding of deep-sea meiofauna. The amplified fragment was V1V2 from 18s. The dada2 AVS were blasted against GenBank and some columns with statistic results were added (columns 1:8). The number of reads on 19 stations is recorded on columns 9:27. Each line represents one ASV.

data(deepMeio)

Inspect the dataset.

summary(deepMeio)

Before you move on, it would be useful to declare some information on gene (fragment studied), pip (pipeline used) and pr (project name). This will help to consistent names for files and graphs.

gene <- 'V1V2'
pip <- 'dada2'
pr <- 'DeepMeio' 

Filter some ASVs

Produce a graph of fragment lengths

mode <- as.integer(names(which(table(deepMeio$length)==max(table(deepMeio$length)))))
counts <- max(table(deepMeio$length))
plot(table(deepMeio$length),ylab='number of  ASVs',xlab='read length')
text(mode,counts,labels=paste (mode, 'bp'),pos = 4)

Apply filters if needed: - 1.) remove ASVs shorter than (for instance) 300bp

deepMeio <- deepMeio[deepMeio$length>=300,]

Other filter could be to remove ASV with query coverage qcovs is lower than (for instance) 90%. Try this:

plot(table(deepMeio$qcovs))
deepMeio <- deepMeio[deepMeio$qcovs>=90,]
apply(deepMeio[,9:27],FUN=sum,MARGIN=2)

As there is no sample with extremely low number of reads we keep all.

Otherwise, to remove (for instance) sample loc1_2:

deepMeio <- deepMeio[,-which(names(deepMeio) == 'loc1_2')]

After applying all filters, check if any higher taxon group was left with 0 ASVs:

table(deepMeio$Group)

This is not the case here. Otherwise remove the groups with 0 ASVs. To remove for instance group mesozoans (if it would have 0 ASVs):

deepMeio <- deepMeio[-which(deepMeio$Group == 'mesozoans'),]

And then drop the level mesozoans from Group:

deepMeio$Group <- droplevels(deepMeio$Group)

Clean up Genbank inconsistent classification

Column Group contains the higher taxon assignment for the AVS. How many are different groups are there?

unique(deepMeio$Group)

At first glance we discover 3 inconsistencies:

Assign the isopods to the group crustaceans:

deepMeio[deepMeio$Group == 'isopods', 'Group'] <- 'crustaceans'

Inspect which taxa are in the group animals:

unique(deepMeio[deepMeio$Group == 'animals', 'Species'])

Surprisingly Haplogonaria, Ascoparia, Nemertoderma and Proporus are all flatworms.

Move these ASVs to flatworms:

deepMeio[deepMeio$Species %in% c( "Haplogonaria_sp._'schillingi'_UJ2011",
 "Ascoparia_sp._UJ2011", "Nemertoderma_sp._SMNHUJ13330", "Proporus_brochii"), 'Group'] <- 'flatworms'

check results with these two commands:

unique(deepMeio[deepMeio$Group == 'animals', 'Species'])
unique(deepMeio[deepMeio$Group == 'flatworms', 'Species'])

Inspect which taxa are in the group eukaryotes:

unique(deepMeio[deepMeio$Group == 'eukaryotes', 'Species'])

As none of the taxa is metazoan meiofauna we leave the group eukaryotes as it is.

Define your target group

Our study focus on free-living metazoan meiofauna. From the remaining groups there are some of them which could be defined as contamination, others are benthic deep-sea organisms but not our target metazoan meiofauna and only some groups are what we are looking for. We define 3 groups contamination, non_target and target:

contamination <- c('animals', 'eukaryotes', 'dinoflagellates', 'ascomycetes','golden_algae') 

non_target <- c('cercozoans', 'hemichordates', 'hydrozoans', 'bryozoans', 'protozoa', 'mesozoans',
'sea_cucumbers', 'tunicates', 'tusk_shells', 'solenogasters')

target <- c('segmented_worms', 'crustaceans', 'ribbon_worms', 'nematodes', 'bivalves',
'flatworms', 'gastrotrichs', 'gastropods', 'loriciferans', 'spoonworms')

Use the function stats.tt to inspect your results and reduce your dataset to target taxa. - x: your data.frame - reads: the subset of columns containing the reads - by: the column containing the grouping factor - cont: the contamination levels - non_target: the non target levels

Everything that is not contamination or non_target is assumed to be target:

stat <- stats.tt(x=deepMeio,reads=deepMeio[,9:27],by=deepMeio$Group,cont=contamination,non_target=non_target)

stat$stats

We can see form the output of stat$stats that the target group has 10 taxa and retains most of the reads (4.4 Million) and ASVs (n_otus = 836).

Inspect the stat object:

stat$cont
stat$non_target
stat$target

Write stats table to a file:

write.table(stat$stats,file=paste('target.stats',gene,pip,pr,'.csv',sep='_'),sep=';',row.names=TRUE,col.names=NA)

Extract your target ASVs to new dataset:

meio <- stat$target.table

remove non used taxa from Group

meio$Group <- droplevels(meio$Group)

Create a color palette

Before we proceed with producing graphs it would be good to have consistent colors for the taxon groups in all the graphs. The function pal2table() creates a matching table of taxa to colors and symbols, so that you can assign same colors and symbols to your taxa consistently in all your graphs. Check ?pal2table() for more options and control.

meiocol <- pal2table(unique(meio$Group),pal='c25')

inspect the created object:

meiocol

View the assigned colors and symbols:

plot(meiocol)

Use your color palette

To use your color palette there is the function match2table. See ?match2table() for details.

t1 <- table(meio$Group)
or <- order(t1,decreasing=TRUE)
assign.col <- match2table(names(t1[or]),meiocol,'col')
par(oma = c(4, 0, 0, 0),xpd=NA)
barplot(t1[or], col=assign.col,las=2)

Create graph with number of reads and ASV per sample

The function barp.table will create the information needed to produce barplots. Inspect the results in ..$table. See ?barp.table() for more details.

bp <- barp.table(meio[,9:27],meio$Group,sum) 

bp$table

Create the barplots (see ?plot.barp_table() for details):

plot(bp, 'dada2sum', pal2table=meiocol, ncol.leg=5,oma1=7)


pmartinezarbizu/dada2pp documentation built on Feb. 7, 2024, 7:01 a.m.