pipe.AlignStats: Generate Alignment Success Stats Images.

pipe.AlignStatsR Documentation

Generate Alignment Success Stats Images.

Description

Auxiliary pipeline step that creates a family of alignment statistic images to summarize all aspects of the alignment pipeline and its metrix.

Usage

pipe.AlignStats( sampleID, annotationFile = "Annotation.txt", optionsFile = "Options.txt", 
	results.path = NULL, banner = "", chunkSize = 500000, maxReads = NULL,
	mode = c( "normal", "QuickQC"), what = NULL, plot = TRUE, fastqFile = NULL)

pipe.AlignmentPie( sampleID, annotationFile = "Annotation.txt", optionsFile = "Options.txt", 
	results.path = NULL, banner = "", mode = c( "normal", "QuickQC"), 
	fastqFile = NULL, useUSR = TRUE)

Arguments

sampleID

The SampleID for this sample.

annotationFile

File of sample annotation details, which specifies all needed sample-specific information about the samples under study. See DuffyNGS_Annotation.

optionsFile

File of processing options, which specifies all processing parameters that are not sample specific. See DuffyNGS_Options.

results.path

The top level folder path for writing result files to. By default, read from the Options file entry 'results.path'.

banner

Optional character string to add to each plot's main heading.

chunkSize

Integer. The buffer size to use for reading in and evaluating alignments. Most statistics are tallied and images printed after each buffer, to show incremental progress.

maxReads

Optional integer to limit the number of alignments evaluated.

mode

Controls the behavior of how alignments are interpreted. Mode "QuickQC" invokes the behavior for preliminary QC analysis. See pipe.QuickQC.

what

An optional character string that specifies which types of statistics to monitor. Default is to monitor every type of feature, or "SGBIDMA" where:

S: Sequences: features about chromosome, like read counts and percentages.

G: Genes: features about genes, like read counts and percentages for highly detected genes.

B: Bases: features about base calls, locations of mismatches, and nucleotide usage.

I,D: Insertions & Deletions: features about indel locations in the aligned reads.

M: MARs (Multiply Aligned Reads): features about reads hitting 2+ locations.

A: Align scores: features about the distribution of Bowtie alignment scores.

fastqFile

Optional character string for the original FASTQ file that was input to the alignment pipeline. Default is to look it up from annotation file.

useUSR

Logical. Include a survey of USRs (Unique Short Reads) in the pie, to assess presence of empty adapters, Poly-N, etc.

Details

This pipeline step tries to evaluate every aspect of how well the raw reads aligned to the target organism(s). It generates a large family of plot images, each of which shows some measure of alignment success or failure.

Value

A family of files and plot images is created on disk under the subfolder AlignStats.

Also a list of read counts and percentages as returned from the alignment pie function that summarizes the alignment status of the entire sample.

Author(s)

Bob Morrison


robertdouglasmorrison/DuffyNGS documentation built on Sept. 1, 2024, 9:25 p.m.