checkBimodality: Check bimodality of regions

View source: R/checkBimodality.R

checkBimodalityR Documentation

Check bimodality of regions

Description

Compute the maximum bimodality score across all base pairs in each region.

Usage

checkBimodality(bam.files, regions, width=100, param=readParam(), 
    prior.count=2, invert=FALSE, BPPARAM=SerialParam()) 

Arguments

bam.files

A character vector containing paths to sorted and indexed BAM files. Alternatively, a list of BamFile objects.

regions

A GenomicRanges object specifying the regions over which bimodality is to be calculated.

width

An integer scalar or list indicating the span with which to compute bimodality.

param

A readParam object containing read extraction parameters.

prior.count

A numeric scalar specifying the prior count to compute bimodality scores.

invert

A logical scalar indicating whether bimodality score should be inverted.

BPPARAM

A BiocParallelParam specifying how parallelization is to be performed across files.

Details

Consider a base position x. This function counts the number of forward- and reverse-strand reads within the interval [x-width+1, x]. It then calculates the forward:reverse ratio after adding prior.count to both counts. This is repeated for the interval [x, x+width-1], and the reverse:forward ratio is then computed. The smaller of these two ratios is used as the bimodality score.

Sites with high bimodality scores will be enriched for forward- and reverse-strand enrichment on the left and right of the site, respectively. Given a genomic region, this function will treat each base position as a site. The largest bimodality score across all positions will be reported for each region. The idea is to assist with the identification of transcription factor binding sites, which exhibit strong strand bimodality. The function will be less useful for broad targets like histone marks.

If multiple bam.files are specified, they are effectively pooled so that counting uses all reads in all files. A separate value of width can be specified for each library, to account for differences in fragmentation – see the ext argument for windowCounts for more details. In practice, this is usually unnecessary. Setting width to the average fragment length yields satisfactory results in most cases.

If invert is set, the bimodality score will be flipped around, i.e., it will be maximized when reverse-strand coverage dominates on the left, and forward-strand coverage dominates on the right. This is designed for use in CAGE analyses where this inverted bimodality is symptomatic of enhancer RNAs.

Value

A numeric vector containing the maximum bimodality score across all bases in each region.

Author(s)

Aaron Lun

Examples

bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
incoming <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'), 
    IRanges(c(1, 500, 100, 1000), c(100, 580, 500, 1500)))

checkBimodality(bamFiles, incoming)
checkBimodality(bamFiles, incoming, width=200)
checkBimodality(bamFiles, incoming, param=readParam(minq=20, dedup=TRUE))
checkBimodality(bamFiles, incoming, prior.count=5)

# Works on PE data; scores are computed from paired reads.
bamFile <- system.file("exdata", "pet.bam", package="csaw")
checkBimodality(bamFile, incoming[1:3], param=readParam(pe="both"))
checkBimodality(bamFile, incoming[1:3], param=readParam(pe="both", max.frag=100))

LTLA/csaw documentation built on Dec. 21, 2024, 1:10 a.m.