tuneAlignment: Tune alignment parameters

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/tuneAlignment.R

Description

Tune parameters for adaptor alignment to maximize discriminative power compared to a randomized control.

Usage

1
2
3
tuneAlignment(adaptor1, adaptor2, filepath, tolerance=200, 
    number=10000, gapOp.range=c(4, 10), gapExt.range=c(1, 5), 
    qual.type=c("phred", "solexa", "illumina"), BPPARAM=SerialParam()) 

Arguments

adaptor1, adaptor2

A string or DNAString object containing the 5'-to-3' sequences of the adaptors on each end of the read.

filepath

A string containing the path to the FASTQ file, or a connection object to a FASTQ file.

tolerance

An integer scalar specifying the length of the ends of the reads to search for adaptors.

number

An integer scalar specifying the number of randomly sampled reads to use for tuning.

gapOp.range

An integer vector of length 2 specifying the boundaries of the grid search for the gap opening penalties.

gapExt.range

An integer vector of length 2 specifying the boundaries of the grid search for the gap extension penalties.

qual.type

String specifying the type of quality scores in filepath.

BPPARAM

A BiocParallelParam object specifying whether alignment should be parallelized. Currently only effective up to a maximum of 4 workers.

Details

This function will align adaptors to the start and end of read sequences in the same manner as adaptorAlign. It will then perform a grid search to identify the best parameters for alignment. This is done by repeating the alignments for all possible combinations of integer gap opening or extension penalties.

To evaluate each parameter combination, we examine the distribution of combined alignment scores for all reads. This represents the best adaptor alignment and is equivalent to the approach used in adaptorAlign to determine the read orientation. The best parameter combiantion is which minimizes the overlap between the distribution of maximum alignment scores for reads and that of a scrambled control. Obviously, we only look for combinations where the former distribution is shifted towards higher scores compared to the scrambled control.

Value

A list containing parameters, itself a list containing the optimal values of all specified alignment parameters. The top-level list will also contain scores, another list containing numeric vectors of alignment scores for the reads and scrambled controls at the optimal parameters.

Author(s)

Aaron Lun

See Also

adaptorAlign to use these parameters.

Examples

1
2
3
4
5
6
7
8
9
# Mocking up a small data set.
a1 <- "AACGGGTCGNNNNNNNACGTACGTNNNNACGA" 
a2 <- "CGTGCTGCATCG"
fout <- tempfile(fileext=".fastq")
ref <- sarlacc:::mockReads(a1, a2, fout, nmolecules=1, 
    nreads.range=c(10, 10), seqlen.range=c(50, 200))

# Aligning it.
(out <- tuneAlignment(adaptor1=a1, adaptor2=a2, filepath=fout))

florian0512/SarlaccSeq documentation built on May 28, 2019, 8:39 p.m.