Description Usage Arguments Details Value Author(s) References See Also Examples
Performs basic peak calling on STARR-seq data based on a method introduced in "Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq" Arnold et al. [1]
1 2 |
object |
A |
minQuantile |
Which quantile of coverage height should be considered as peaks. |
peakWidth |
The width (in base pairs) that the peaks should have. |
maxPval |
The maximal p-value of peaks that is desired. |
deduplicate |
Wether the sequences should be deduplicated before calling peaks or not. |
model |
Which binomial model should be applied to calculate the p-values. |
The peak calling works the following way:
All genomic positions having a STARR-seq coverage over the quantile minQuantile
are considered to be the center of a peak with width peakWidth
. If then two ore more peaks overlap, the lower one is discarded. If then the binomial p-Value of the peak is higher than maxPval
the peak is discarded as well.
The binomial model
1 for calculating the p-Value is: number of trials = total number of STARR-seq sequences, number of successes = STARR-seq coverage, estimated sucess probability in each trial = input coverage/total number of input sequences.
The binomial model
2 for caculating the p-Value is: number of trials = STARR-seq coverage plus input coverage, number of successes = STARR-seq coverage, estimated success probability in each trial = total number of STARR-seq sequences/(total number of STARR-seq sequences plus total number of input sequences). This model is used in [1].
The enrichment of STARR-seq over input coverage is then calculated as follows: (STARR-seq coverage of peak/total number of STARR-seq sequences)/(input coverage of peak/total number of input sequences), the numinator and denuminator corrected conservatively to the bounds of the 0.95 binomial confidence inverval corresponding to model
1.
The method getPeaks
return a GRanges
object. The contained ranges are the found peaks with desired width peakWidth
. The metadata columns of the ranges contain four elements:
|
The maximal and central STARR-seq coverage of the peak. |
|
The maximum of the central and the median input coverage of the peak. |
|
The binomial p-Value of the coverage height of the peak normalised to toal number of sequences in STARR-seq and input. |
|
The enrichment of STARR-seq over input coverage height normalised to total number of sequences in STARR-seq and input corrected conservatively to the bounds of a confidence interval. |
Annika Buerger
[1] Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq. Arnold et al. Science. 2013 Mar 1;339(6123):1074-7. doi: 10.1126/science.1232542. Epub 2013 Jan 17.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # create a small sample STARRseqData object
starrseqFileName <- system.file("extdata", "smallSTARR.bam",
package="BasicSTARRseq")
inputFileName <- system.file("extdata", "smallInput.bam",
package="BasicSTARRseq")
data <- STARRseqData(sample=starrseqFileName, control=inputFileName,
pairedEnd=TRUE)
# call peaks with default parameters
peaks = getPeaks(data)
# call peaks with no deduplication and no restriction concerning p-value
peaks = getPeaks(data, maxPval = 1, deduplicate = FALSE)
# call peaks with other binomial model and width 700
peaks = getPeaks(data, peakWidth = 700, model = 2)
# call peaks assuming less regions as potential peaks
peaks = getPeaks(data, minQuantile = 0.99)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.