demultiplexprimer: Trim specific primer sequences
In aliafdz/QApckg: Quality assessment for Miseq data derived from viral sequencing

demultiplexPrimer

R Documentation

Trim specific primer sequences

Description

Demultiplex reads by identifying template specific primer sequences within windows of expected positions in the sequenced reads. It is important to note that MID and template specific primer sequences will be trimmed from reads after the identification of primers, but amplicon length is not predetermined.

Usage

demultiplexPrimer(
  splitfiles,
  samples,
  primers,
  prmm = 3,
  min.len = 180,
  target.st = 1,
  target.end = 100
)

Arguments

`splitfiles`	Vector including the paths of demultiplexed files by MID, with fna extension.
`samples`	Data frame with relevant information to identify the samples of the sequencing experiment, including `Patient.ID, MID, Primer.ID, Region, RefSeq.ID`, and `Pool.Nm` columns.
`primers`	Data frame with information about the template specific primers used in the experiment, including `Ampl.Nm, Region, Primer.FW, Primer.RV, FW.pos, RV.pos, FW.tpos, RV.tpos, Aa.ipos`, and `Aa.lpos` columns.
`prmm`	Number of mismatches allowed between the primers and read sequences.
`min.len`	Minimum length desired for haplotypes. Any sequence below this length will be discarted.
`target.st, target.end`	Initial and end positions between which template specific primer sequences will be searched.

Details

After demultiplexing reads by MID with demultiplexMID function, template specific primer sequences are identified in both strands. First, forward strands are recognized by searching FW primer sequence in 5' end and the reverse complement of RV primer sequence in 3' end. Then, reverse strands are recognized by searching RV primer sequence in 5' end and FW primer sequence in 3' end, after obtaining the reverse complement of all reads identified as reverse strands. So, both strands are obtained in a way that facilitates their intersection.

Value

A list containing the following:

`fileTable`	A table with relevant data of each FASTA file generated in execution, including their associated strand, mean read length, total reads and total haplotypes obtained.
`poolTable`	A table with the number of total trimmed reads and the yield of the process by pool.

After execution, a FASTA file for each combination of strand, MID and pool will be saved in a newly created trim folder. Additionaly, some report files will be generated in a reports folder:

AmpliconLengthsRprt.txt: Includes the amplicon lengths of both strands for each sample (with their corresponding MID identifier).
AmpliconLengthsPlot.pdf: Includes a barplot for each sample representing the amplicon lengths of both strands.
SplitByPrimersOnFlash.txt: Includes a table of reads identified by primer, total reads identified by patient and the yield by pool.
SplitByPrimersOnFlash.pdf,SplitByPrimersOnFlash-hz.pdf: Includes some plots representing primer matches by patient (in nº of reads) and the coverage of forward/reverse matches by pool.
SplittedReadsFileTable.txt: A file containing the same information as fileTable.

Author(s)

Alicia Aranda

Examples

# Set parameters
prmm <- 3
min.len <- 180
# The expected window for template specific primer sequences will depend on the presence of
# adapters, MID sequences and/or M13 primer.
target.st <- 1
target.end <- 100
splitDir <- "./splits"
# Save the file names with complete path
splitfiles <- list.files(splitDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
# Get data
samples <- read.table("./data/samples.csv", sep="\t", header=T,
                      colClasses="character",stringsAsFactors=F)
primers <- read.table("./data/primers.csv", sep="\t", header=T,
                   stringsAsFactors=F)
pm.res <- demultiplexPrimer(splitfiles,samples,primers,prmm,min.len,target.st,target.end)

aliafdz/QApckg documentation built on June 2, 2022, 10:29 a.m.