conshaplotypes: Generate consensus haplotypes
In aliafdz/QApckg: Quality assessment for Miseq data derived from viral sequencing

ConsHaplotypes

R Documentation

Generate consensus haplotypes

Description

Computes the intersection of forward and reverse strand haplotypes and generates some report files.

Usage

ConsHaplotypes(trimfiles, pm.res, thr = 0.2, min.seq.len = 150, max.difs = 250)

Arguments

`trimfiles`	Vector including the paths of demultiplexed files by specific primer, with fna extension.
`pm.res`	The list returned by `demultiplexPrimer`, including `fileTable` and `poolTable` data frames.
`thr`	Threshold to filter haplotypes at minimum abundance before multiple alignment.
`min.seq.len`	Threshold to filter haplotypes at minimum length before intersection.
`max.difs`	Maximum number of mismatches allowed in resulting consensus haplotypes with respect to the dominant one.

Details

This function is designed to be used after the execution of demultiplexPrimer function from the same package. After the generation of FASTA files containing forward and reverse strand reads for the evaluated samples, ConsHaplotypes executes multiple alignment with muscle and returns the consensus haplotypes using IntersectStrandHpls, that will be saved using the helper function SaveHaplotypes.

Value

The function returns a data.frame object containing the intersection results for each combination of patient and amplicon region, including the initial number of reads, filtered out reads (for being below a given frequency threshold or unique to a single strand), overlapping frequency between both strands and the common reads (in percentage and nº of reads).

After execution, two FASTA files for each combination of sample and pool will be saved in a newly generated MACH folder; the first includes multiple alignment between forward and reverse strand haplotypes, and the second includes the forward and reverse strands intersected. Additionaly, some report files will be generated in the reports folder:

MA.Intersects-SummRprt.txt: Includes the sumary results by reads number after abundance filter and strand intersection.
MA.Intersects.plots.pdf: Includes different barplots for each sample representing the frequency of forward, reverse and intersected strand haplotypes.
IntersectBarplots.pdf: Includes different barplots for all combinations of patient and pool, representing the number of intersected and filtered out reads, the intersection yield and global yield.

Note

A new file named muscle.log containing muscle options will be generated and saved in a folder named "tmp".

Author(s)

Alicia Aranda

Examples

splitDir <- "./splits"
# Save the file names with complete path
splitfiles <- list.files(splitDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
# Get data
samples <- read.table("./data/samples.csv", sep="\t", header=T,
                      colClasses="character",stringsAsFactors=F)
mids <- read.table("./data/mids.csv", sep="\t", header=T,
                   stringsAsFactors=F)
# Apply previous function from QA analysis
pm.res <- demultiplexPrimer(splitfiles,samples,primers)
# Save the files generated by previous function
trimDir <- "./trim"
trimfiles <- list.files(trimDir,recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
# Define necessary parameters
min.seq.len <- 150
thr <- 0.2
int.res <- ConsHaplotypes(trimfiles, pm.res, thr, min.seq.len)

aliafdz/QApckg documentation built on June 2, 2022, 10:29 a.m.