getPESizes: Compute fragment lengths for paired-end tags

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/getPESizes.R

Description

Compute the length of the sequenced fragment for each read pair in paired-end tag (PE) data.

Usage

1
getPESizes(bam.file, param=readParam(pe="both"))

Arguments

bam.file

a character string containing the file path to a sorted and indexed BAM file

param

a readParam object containing read extraction parameters

Details

This function computes a number of diagnostics for paired-end data in a supplied BAM file. The aims is to provide an indication of the quality of library preparation and sequencing.

Firstly, a read is only considered to be mapped if it is not removed by dedup, minq, restrict or discard in readParam. Otherwise, the alignment is not considered to be reliable. Any read pair with exactly one unmapped read is discarded, and the number of read pairs lost in this manner is recorded. Obviously, read pairs with both reads unmapped will be ignored completely, as will any unpaired reads in the BAM file. Secondary and supplementary alignments are ignored completely and do not contribute to the total - see readParam for details.

Of the mapped pairs, the valid (i.e., proper) read pairs are identified. This involves several criteria:

Note that the 3' end of one read is allowed to overrun the 5' end of the other. This avoids being too stringent in the presence of sequencing errors, untrimmed adaptors, etc. at the start or ends of reads.

Each valid read pair corresponds to a DNA fragment where both ends are sequenced. The size of the fragment can be determined by calculating the distance between the 5' ends of the mapped reads. The distribution of sizes is useful for assessing the quality of the library preparation, along with all of the recorded diagnostics. Note that any max.frag specification in param will be ignored; sizes for all valid pairs will be returned.

Value

A list containing:

sizes

an integer vector of fragment lengths for all valid read pairs in the library

diagnostics

an integer vector containing the total number of reads, the number of mapped reads, number of mapped singleton reads, pairs with exactly one unmapped read, number of improperly orientated read pairs and interchromosomal pairs

Author(s)

Aaron Lun

See Also

readParam

Examples

1
2
3
4
bamFile <- system.file("exdata", "pet.bam", package="csaw")
out <- getPESizes(bamFile, param=readParam(pe="both"))
out <- getPESizes(bamFile, param=readParam(pe="both", restrict="chrA"))
out <- getPESizes(bamFile, param=readParam(pe="both", discard=GRanges("chrA", IRanges(1, 50))))

csaw documentation built on Nov. 12, 2020, 2:03 a.m.