Home

/

GitHub

/

In leholman/metabarTOAD: A group of simple functions for handling metabarcoding data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

Analysing metabarcoding data using metabarTOAD

There are a great number of pipelines and methods for the analysis of metabarcoding data. No single method is appropriate or inappropriate in all situations. Methods and ideas are constantly being developed so keep an eye on the horizon and focus on fundamental and transferable skills. This tutorial is for beginners and demonstrates the workflow.

Requirements

Library design

This pipeline is designed to work with paired-end Illumina amplicon data generated from a 2-step PCR method as Bista et al.2017. We are expecting sufficient (>30bp) overlap of the paired end reads.

Software

For this tutorial you will need the following

Cutadapt (version 2.0+) link
VSEARCH (version 2.10.4+) link
USEARCH (version 11+) link

These are already installed on the system for this tutorial. On your own system you may wish to have the executables in the folder with your analysis or you can have then in the PATH. Not sure about the PATH? - go here and have a read when you have some time.

Workflow Overview

Setting up

Make sure you start off with a clean folder in a new working directory, in this case we have some files and folders to help you get started so don't worry if you see these. Have a look with dir(). Let's start by loading up the required package(s)

require(metabarTOAD)

Now we use the first command to set up the required folder structure.

Folders()

Great, lets check the folders. We can use dir() as before. We should see 8 folders with different names prefixed with a number.

Now we have a folder structure normally we would move our raw Illumina data into the 1.rawreads but it should already be there for you. You might wish to copy your raw files into the 0.backup folder as well in case you want to start the analysis again from scratch.

The pipeline can be run for multiple primer sets by providing sample and primer data in the required format. Download the examples here. This tutorial assumes we're only working on a single set of primers but provide primer and sample data.

Unzipping and Counting

First lets check our files to make sure everything is as we expect.

list.files("1.rawreads/")

We should have pairs of compressed sample files ready for analysis.

Now everything is in place we can start to analyse the data. First lets unzip all the data, and then check to make sure everything has been unzipped.

Unzip()
list.files("1.rawreads/")

We can rename our files at this stage if we wish. First we pull the names from our metadata file.

index <- read.csv("metadata.csv")

We now put the desired and current names into the function RenameFiles to change the names of our samples.

RenameFiles(SeqIDs = as.character(index$RunID),DesiredIDs = as.character(index$RealID))
list.files("1.rawreads/")

If we prefer we can perform the same operation using the metadata sheet that we have copied into the base of the working directory with the below function. If you ran the above function then you have already renamed your files so no need to run the line below!

RenameFiles(usemetadatafile = TRUE)

Now we have everything named we can count the number of reads per sample. First we create a list of files to be counted.

files <- list.files("1.rawreads",pattern=".fastq",full.names = TRUE)

Then we use FastqCount and sapplyto count the reads for each sample.

rawreadcount <- sapply(files,FastqCount)
rawreadcount

We can calculate the mean and standard deviation of reads across the raw samples.

mean(rawreadcount)
sd(rawreadcount)

Merging and Filtering

Next we merge the forward and reverse reads as below. In this case we have already downloaded and installed usearch. You can see

MergeReads(usearchdest = "software/usearch")

We can count the number of merged reads like this.

mergedreadcount <- sapply(list.files("2.mergedreads",pattern=".fastq",full.names = TRUE),FastqCount)
mergedreadcount

Our sequences are now combined into a single fastq per read pair. Next we need to strip the primers from each end of the read. We do this as below. You can see the primer information and the maximum and minimum trim size of amplicon. In your own data you will want to change these values. We can then count the number of reads remaining after stripping primers.

PrimerStrip(PrimerF = "NNNNNNGGWACWGGWTGAACWGTWTAYCCYCC",
            PrimerR = "TAIACYTCIGGRTGICCRAARAAYCA",
            MinLen = 303,
            MaxLen = 323,
            cutadaptdest = "cutadapt",
            ncores=1)

strippedreadcount <- sapply(list.files("3.strippedreads",pattern=".fastq",full.names = TRUE),FastqCount)
strippedreadcount

Next we pool all our sequences together and discard singleton sequences (sequences only found once in the data).

PoolNFilterReads(vsearchdest="software/vsearch")

Clustering and Denoising

Now everything is prepped and ready to go we can generate 97% OTUs using the below command.

OTUCluster(usearchdest = "software/usearch")

Once this function has run then we should be able to load up our OTU by sample tab as below. We can use dim to check the size of our dataframe.

initial.analysis <- read.csv("6.mappings/OTUtabs/AllSamples.raw.0.97.csv")
dim(initial.analysis)

Now just to check everything looks ok lets quickly look at the number of observed OTUs in our samples.

pdf("test.pdf",width=9,height=8)
barplot(colSums(initial.analysis != 0),las=2,ylab="Number of OTUs")
dev.off()

If everything has gone right then you should see something like this.

Download the pdf using Filezilla and check.

If you have time try producing denoised OTUs (sometimes called MOTUS or ASVs or ESVs) using the UNOISE3 algorithm (paper here).

DenoiseUNOISE3(usearchdest = "software/usearch")

When using the above two functions the OTUs will be output to the 5.OTUs folder and the OTU by sample table to 6.mappings/OTUtabs. These files can be used in downstream ecological analyses.

Post Clustering Curation with LULU

Very often in metabarcoding datasets we see a large number of OTUs with close percentage identity. These may correspond with intraspecific diversity, unfiltered PCR errors or other methodological foibles. One way to deal with these is to use the LULU algorithm (paper here). A great tutorial is provided on the GitHub page, but you can just use the below wrapper script if you prefer.

library("lulu")

ApplyLulu(seqs="5.OTUs/AllSamples.0.97.OTUs.fasta",
          table="6.mappings/OTUtabs/AllSamples.raw.0.97.csv",
          output="8.LULU/post.lulu.curation.csv",
          vsearchdest="software/vsearch")

We can check the effect of LULU by examining the difference between the number of OTUs recognised in our pre and post lulu files.

post.lulu <- read.csv("8.LULU/post.lulu.curation.csv")

dim(initial.analysis)
dim(post.lulu)

How many OTUs did we collapse using LULU?

leholman/metabarTOAD documentation built on Aug. 27, 2023, 9:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

leholman/metabarTOAD
A group of simple functions for handling metabarcoding data

In leholman/metabarTOAD: A group of simple functions for handling metabarcoding data

Analysing metabarcoding data using metabarTOAD

Requirements

Library design

Software

Workflow Overview

Setting up

Unzipping and Counting

Merging and Filtering

Clustering and Denoising

Post Clustering Curation with LULU

R Package Documentation

Browse R Packages

We want your feedback!

leholman/metabarTOAD A group of simple functions for handling metabarcoding data

In leholman/metabarTOAD: A group of simple functions for handling metabarcoding data

Analysing metabarcoding data using metabarTOAD

Requirements

Library design

Software

Workflow Overview

Setting up

Unzipping and Counting

Merging and Filtering

Clustering and Denoising

Post Clustering Curation with LULU

R Package Documentation

Browse R Packages

We want your feedback!

leholman/metabarTOAD
A group of simple functions for handling metabarcoding data