PoolQCbyPos: Evaluate QC by position

View source: R/PoolQCbyPos.R

PoolQCbyPosR Documentation

Evaluate QC by position

Description

This function evaluates fastq files before and after the execution of the FLASH program to extend paired-end reads, and returns Quality Control (QC) by position plots in pdf format.

It can be applied also after filtering FLASH fastq files by Phred Score.

Usage

PoolQCbyPos(flashfiles, samples, primers, runfiles, ncores = 1)

Arguments

flashfiles

Vector including the paths of FLASH processed/filtered files, with fastq extension.

samples

Data frame with relevant information to identify the samples of the sequencing experiment, including Patient.ID, MID, Primer.ID, Region, RefSeq.ID, and Pool.Nm columns.

primers

Data frame with information about the primers used in the experiment, including Ampl.Nm, Region, Primer.FW, Primer.RV, FW.pos, RV.pos, FW.tpos, RV.tpos, Aa.ipos, and Aa.lpos columns.

runfiles

Vector including the paths of Illumina MiSeq Raw Data files, often with fastq.gz extension. If the function is applied for filtered fastq files, this argument must be NA or missing.

ncores

Number of cores to use for parallelization with mclapply.hack.

Value

After execution, a pdf file for each pool used in the experiment will be saved in a reports folder (if it is not previously defined, the function will create this folder), and a message indicating that the files are generated will appear in console.

If the function is applied after the execution of FLASH, the pdf file(s) will be named PoolQCbyPos.PoolName.pdf, where PoolName is extracted from samples data frame. The file(s) contain a QC plot for both raw data and extended fastq files, and also the read length distribution for the evaluated pool.

In contrast, if the function is applied after Phred Score filtering, the generated pdf file(s) will be named PoolFiltQCbyPos.PoolName.pdf, including a QC plot for the filtered data and another plot representing read length distribution.

Author(s)

Alicia Aranda

See Also

R1R2toFLASH, FiltbyQ30, QCscores, QCplot

Examples

runDir <- "./run"
flashDir <- "./flash"
repDir <- "./reports"
# Save the file names with complete path
runfiles <- list.files(runDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
flashfiles <- list.files(flashDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
# Get data
samples <- read.table("./data/samples.csv", sep="\t", header=T,
                     colClasses="character",stringsAsFactors=F)
primers <- read.table("./data/primers.csv", sep="\t", header=T,
                      stringsAsFactors=F)
PoolQCbyPos(flashfiles,samples,primers,runfiles)

aliafdz/QApckg documentation built on June 2, 2022, 10:29 a.m.