qualcontour: Create read quality color contour plots

Description Usage Arguments Details Value See Also Examples

Description

This function generates a 2-D color/contour map representing the average quality scores by location (read cycle number) for a designated percentile. It is intended to assist the user with deciding where trimming should be performed.

Usage

1
2
3
qualcontour(f_path, r_path, idx, percentile = 0.25, amp_length, min_overlap,
  n_samples = 12, q = c(25, 30, 35), bins = 50, nc = 1,
  seed = sample.int(.Machine$integer.max, 1), verbose = FALSE)

Arguments

f_path

(required) A character vector locating the forward read (Read 1) .fastq files

r_path

(required) A character vector locating the reverse read (Read 2) .fastq files

idx

Indexes (within f_path and r_path) identifying specific .fastq files to be used for analysis

percentile

The percentile to be targeted . Defaults to .25 (i.e. the first quartile).

amp_length

Intra-primer amplicon length. Calculated distance in base-pairs between primers. Used to determine region of no overlap. Both 'amp_length' and 'min_overlap' must be provided for these calculations.

min_overlap

The minimum amount of overlap between the two reads. Used to determine region of no overlap. Both 'amp_length' and 'min_overlap' must be provided for these calculations.

n_samples

Integer indicating the number of samples to include in the visualization. Defaults to 12.

q

A numeric vector designating Phred quality scores to be represented on the plot. Defaults to 25, 30, and 35.

bins

Integer designating the number of bins each read should be separated into. For example, visualizing a 250 bp read with 50 bins would imply that each bin represents 5 cycles/bp. Increasing the number of bins improves granularity at the cost of memory and processing speed. Defaults to 50.

nc

The number of cores to use when multithreading. Defaults to 1.

seed

An integer value to be used when randomly selecting the subset of samples to be visualized.

verbose

If set to TRUE, provides verbose output. Defaults to FALSE.

Details

qualcontour's (quality contour) two required arguments are character vectors of the file paths for forward ('f_path') and reverse ('r_path') reads. qualcontour tabulates the distribution of quality scores at each read cycle for the forward and reverse reads independently and then averages (arithmetic mean) the quality scores for each (forward/reverse) cycle combination. These values are then plotted as a ggplot2 object. Users can (re)run 'qualcontour' with different 'percentile' values to visualize how the quality scores varies in shape. plotQualityProfile in the 'dada2' package provides an elegant way of looking at the quality profiles for the forward or reverse reads.

Value

A ggplot object with the following attributes:

idx

Samples used to generate the plot.

amp_length

Value for amp_length used to generate the plot.

min_overlap

Value for min_overlap used to generate the plot.

seed

Seed used to select the samples used to generate the plot.

See Also

qa plotQualityProfile

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
library(theseus)
library(ggplot2)
fns <- sort(list.files(file.path(system.file(package='theseus'),
            '/testdata/'), full.names=TRUE))
f_path <- fns[grepl('R1.fastq.gz', fns)]
r_path <- fns[grepl('R2.fastq.gz', fns)]
p.qc <- qualcontour(f_path, r_path, n_samples=2, verbose=TRUE,
                    percentile=.25, nc=1)
p.qc
p.qc + geom_hline(yintercept=175) + geom_vline(xintercept=275)

## End(Not run)

EESI/theseus documentation built on May 24, 2019, 7:21 p.m.