demultiplexMID: Split reads by MID sequence

View source: R/demultiplexMID.R

demultiplexMIDR Documentation

Split reads by MID sequence

Description

Demultiplex reads by identifying MID sequences within windows of expected positions in the sequenced reads. MIDs are 10 base-length oligonucleotides that allow the identification of samples from different patients or origins.

It is important to note that MID sequences will be not trimmed from reads, they are only identified for associate them with each sample.

Usage

demultiplexMID(
  flashffiles,
  samples,
  mids,
  maxdif = 1,
  mid.start = 1,
  mid.end = 40
)

Arguments

flashffiles

Vector including the paths of FLASH filtered files, with fastq extension.

samples

Data frame with relevant information to identify the samples of the sequencing experiment, including Patient.ID, MID, Primer.ID, Region, RefSeq.ID, and Pool.Nm columns.

mids

Data frame containing the MID sequences and their identifiers.

maxdif

Number of mismatches allowed between MID and read sequences.

mid.start

Expected start position for MID in sequence.

mid.end

Expected end position for MID in sequence.

Value

A list containing the following:

nreads

A table with the number of reads identified for each MID.

by.pools

A table with the coverage of reads demultiplexed by pool.

After execution, a FASTA file for each combination of MID and pool will be saved in a splits folder (that will be created in working directory), including its associated reads. Additionaly, two report files will be generated in a reports folder:

  1. SplidByMIDs.barplots.pdf: Includes a first barplot representing nreads data values, and a second plot with the by.pools data values.

  2. SplidByMIDs.Rprt.txt: Includes the same data tables returned by the function.

Author(s)

Alicia Aranda

See Also

FiltbyQ30

Examples

flashFiltDir <- "./flashFilt"
# Save the file names with complete path
flashffiles <- list.files(flashFiltDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
# Get data
samples <- read.table("./data/samples.csv", sep="\t", header=T,
                      colClasses="character",stringsAsFactors=F)
mids <- read.table("./data/mids.csv", sep="\t", header=T,
                   stringsAsFactors=F)
# Set parameters
maxdif <- 1
mid.start <- 1
mid.end <- 40
dem.res<-demultiplexMID(flashffiles,samples,mids,maxdif,mid.start,mid.end)

aliafdz/QApckg documentation built on June 2, 2022, 10:29 a.m.