Description Usage Arguments Details Value Author(s) See Also Examples
The function evaluates transcription initiation within a peak region by comparing RNA-seq read densities upstream and downstream of an empirically determined transcription start sites. Putative transcription of both forward and reverse genomic strands is tested and the results are stored with each ChIP-seq peak.
1 2 3 4 5 6 | predictStrand(cdsObj, tdsObj, coverage.cutoff, quant.cutoff = 0.1,
win.size = 2500, prob.cutoff)
## S4 method for signature 'ChipDataSet,TranscriptionDataSet'
predictStrand(cdsObj, tdsObj,
coverage.cutoff, quant.cutoff = 0.1, win.size = 2500, prob.cutoff)
|
cdsObj |
A |
tdsObj |
A |
coverage.cutoff |
|
quant.cutoff |
|
win.size |
|
prob.cutoff |
|
RNA-seq data is incorporated to find direct evidence of active transcription from every putatively gene associated peak. In order to do this, we determine the 'strandedness' of the ChIP-seq peaks, using strand specific RNA-seq data. The following assumptions are made in order to retrieve the peak 'strandedness':
The putatively gene associated ChIP-seq peaks are commonly associated with transcription initiation.
This transcription initiation occurs within the ChIP peak region.
When a ChIP peak is associated with a transcription initiation event, we expect to see a strand-specific increase in RNA-seq fragment count downstream the transcription initiation site.
Each peak in the data set is tested for association with transcription initiation on both strands of DNA. Steps 1-5 are performed for both forward and reverse DNA strand separately and step 6 combines the data from both strands. If the peak is identified as associated with the transcription on both strands, than it is considered to be a bidirectional.
ChIP peak 'strandedness' prediction steps:
Identify a location within the ChIP-seq peak near the transcription start site. This is accomplished by calculating the cumulative distribution of RNA-seq fragments within a peak region. The position is determined where 100% - 'quant.cutoff' * 100% of RNA-seq fragments are located downstream. This approach performs well on both gene-poor and gene-dense regions where transcripts may overlap.
Two equally sized regions are defined (q1 and q2), flanking the position identified in (1) on both sides. RNA-seq fragments are counted in each region.
ChIP peaks with an RNA-seq fragment coverage below an estimated threshold are discarded from the analysis.
The probability is calculated for RNA-seq fragments to be sampled from either q1 or q2. Based on the assumptions we stated above, a ChIP peak that is associated with transcription initiation should have more reads in q2 (downstream of the transcription start position) compared to q1, and subsequently, the probability of a fragment being sampled from q2 would be higher.
ChIP-seq peaks are divided into gene associated and background based on the prediction.
Iteratively, the optimal P(q2) threshold is identified, which balances out the False Discovery Rate (FDR) and False Negative Rate (FNR). Peaks with the P(q2) exceeding the estimated threshold are considered to be associated with the transcription initiation event.
The slot strandPrediction
of the provided
ChipDataSet
object will be updated by the the following
elements: 'predicted.strand', 'probability.cutoff', 'results.plus' and
'results.minus'.
Armen R. Karapetyan
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ### Load TranscriptionDataSet object
data(tds)
### Load ChipDataSet object
data(cds)
### Classify peaks on gene associated and background
predictTssOverlap(object = cds, feature = "pileup", p = 0.75)
### Predict peak 'strand'
predictStrand(cdsObj = cds, tdsObj = tds, coverage.cutoff = 5,
quant.cutoff = 0.1, win.size = 2500)
### View a short summary of the 'strand' prediction
cds
### View 'strand' prediction
getPeaks(cds)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.