demultiplexMID: Split reads by MID sequence
In aliafdz/QApckg: Quality assessment for Miseq data derived from viral sequencing

demultiplexMID

R Documentation

Split reads by MID sequence

Description

Demultiplex reads by identifying MID sequences within windows of expected positions in the sequenced reads. MIDs are 10 base-length oligonucleotides that allow the identification of samples from different patients or origins.

It is important to note that MID sequences will be not trimmed from reads, they are only identified for associate them with each sample.

Usage

demultiplexMID(
  flashffiles,
  samples,
  mids,
  maxdif = 1,
  mid.start = 1,
  mid.end = 40
)

Arguments

`flashffiles`	Vector including the paths of FLASH filtered files, with fastq extension.
`samples`	Data frame with relevant information to identify the samples of the sequencing experiment, including `Patient.ID, MID, Primer.ID, Region, RefSeq.ID`, and `Pool.Nm` columns.
`mids`	Data frame containing the MID sequences and their identifiers.
`maxdif`	Number of mismatches allowed between MID and read sequences.
`mid.start`	Expected start position for MID in sequence.
`mid.end`	Expected end position for MID in sequence.

Value

A list containing the following:

`nreads`	A table with the number of reads identified for each MID.
`by.pools`	A table with the coverage of reads demultiplexed by pool.

After execution, a FASTA file for each combination of MID and pool will be saved in a splits folder (that will be created in working directory), including its associated reads. Additionaly, two report files will be generated in a reports folder:

SplidByMIDs.barplots.pdf: Includes a first barplot representing nreads data values, and a second plot with the by.pools data values.
SplidByMIDs.Rprt.txt: Includes the same data tables returned by the function.

Author(s)

Alicia Aranda

Examples

flashFiltDir <- "./flashFilt"
# Save the file names with complete path
flashffiles <- list.files(flashFiltDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
# Get data
samples <- read.table("./data/samples.csv", sep="\t", header=T,
                      colClasses="character",stringsAsFactors=F)
mids <- read.table("./data/mids.csv", sep="\t", header=T,
                   stringsAsFactors=F)
# Set parameters
maxdif <- 1
mid.start <- 1
mid.end <- 40
dem.res<-demultiplexMID(flashffiles,samples,mids,maxdif,mid.start,mid.end)

aliafdz/QApckg documentation built on June 2, 2022, 10:29 a.m.