Peak calling on STARRseq data
Description
Performs basic peak calling on STARRseq data based on a method introduced in "GenomeWide Quantitative Enhancer Activity Maps Identified by STARRseq" Arnold et al. [1]
Usage
1 2  getPeaks(object, minQuantile = 0.9, peakWidth = 500, maxPval = 0.001,
deduplicate = TRUE, model = 1)

Arguments
object 
A 
minQuantile 
Which quantile of coverage height should be considered as peaks. 
peakWidth 
The width (in base pairs) that the peaks should have. 
maxPval 
The maximal pvalue of peaks that is desired. 
deduplicate 
Wether the sequences should be deduplicated before calling peaks or not. 
model 
Which binomial model should be applied to calculate the pvalues. 
Details
The peak calling works the following way:
All genomic positions having a STARRseq coverage over the quantile minQuantile
are considered to be the center of a peak with width peakWidth
. If then two ore more peaks overlap, the lower one is discarded. If then the binomial pValue of the peak is higher than maxPval
the peak is discarded as well.
The binomial model
1 for calculating the pValue is: number of trials = total number of STARRseq sequences, number of successes = STARRseq coverage, estimated sucess probability in each trial = input coverage/total number of input sequences.
The binomial model
2 for caculating the pValue is: number of trials = STARRseq coverage plus input coverage, number of successes = STARRseq coverage, estimated success probability in each trial = total number of STARRseq sequences/(total number of STARRseq sequences plus total number of input sequences). This model is used in [1].
The enrichment of STARRseq over input coverage is then calculated as follows: (STARRseq coverage of peak/total number of STARRseq sequences)/(input coverage of peak/total number of input sequences), the numinator and denuminator corrected conservatively to the bounds of the 0.95 binomial confidence inverval corresponding to model
1.
Value
The method getPeaks
return a GRanges
object. The contained ranges are the found peaks with desired width peakWidth
. The metadata columns of the ranges contain four elements:

The maximal and central STARRseq coverage of the peak. 

The maximum of the central and the median input coverage of the peak. 

The binomial pValue of the coverage height of the peak normalised to toal number of sequences in STARRseq and input. 

The enrichment of STARRseq over input coverage height normalised to total number of sequences in STARRseq and input corrected conservatively to the bounds of a confidence interval. 
Author(s)
A. Buerger
References
[1] GenomeWide Quantitative Enhancer Activity Maps Identified by STARRseq. Arnold et al. Science. 2013 Mar 1;339(6123):10747. doi: 10.1126/science.1232542. Epub 2013 Jan 17.
See Also
GRanges
STARRseqDataclass
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  # create a small sample STARRseqData object
starrseqFileName < system.file("extdata", "smallSTARR.bam",
package="BasicSTARRseq")
inputFileName < system.file("extdata", "smallInput.bam",
package="BasicSTARRseq")
data < STARRseqData(sample=starrseqFileName, control=inputFileName,
pairedEnd=TRUE)
# call peaks with default parameters
peaks = getPeaks(data)
# call peaks with no deduplication and no restriction concerning pvalue
peaks = getPeaks(data, maxPval = 1, deduplicate = FALSE)
# call peaks with other binomial model and width 700
peaks = getPeaks(data, peakWidth = 700, model = 2)
# call peaks assuming less regions as potential peaks
peaks = getPeaks(data, minQuantile = 0.99)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.