The ultimate goal of
transcriptR is to identify continuous regions
of transcription. However, in some areas of the genome it is not possible
to detect transcription, because of the presence of the low mappability
regions and (high copy number) repeats. Sequencing reads can not be uniquely
mapped to these positions, leading to the formation of gaps in otherwise
continuous coverage profiles and segmentation of transcribed regions into
multiple smaller fragments. The gap distance describes the maximum allowed
distance between adjacent fragments to be merged into one transcript. To
choose the optimal value for the gap distance, the detected transcripts
should largely be in agreement with available reference annotations.
To accomplish this, the function is build on the methodology proposed by
Hah et al. (Cell, 2011).
In brief, the two types of erros are defined:
dissected error - the ratio of annotations that is segmented
into two or more fragments.
merged error - the ratio of non-overlapping annotations that
merged by mistake in the experimental data.
There is an interdependence between two types of errors. Increasing the gap
distance decreases the
dissected error, by detecting fewer, but longer
transcripts, while the
merged error will increase as more detected
transcripts will span multiple annotations. The gap distance with the lowest
sum of two error types is chosen as the optimal value.
1 2 3 4 5 6 7 8
estimateGapDistance(object, annot, coverage.cutoff, filter.annot = TRUE, fpkm.quantile = 0.25, gap.dist.range = seq(from = 0, to = 10000, by = 100)) ## S4 method for signature 'TranscriptionDataSet,GRanges' estimateGapDistance(object, annot, coverage.cutoff, filter.annot = TRUE, fpkm.quantile = 0.25, gap.dist.range = seq(from = 0, to = 10000, by = 100))
A numeric vector specifying a range of gap distances to test. By default, the range is from 0 to 10000 with a step of 100.
gapDistanceTest of the provided
TranscriptionDataSet object will be updated by the
data.frame, containing estimated error rates for each
tested gap distance (see
getTestedGapDistances, for the
Armen R. Karapetyan
Hah N, Danko CG, Core L, Waterfall JJ, Siepel A, Lis JT, Kraus WL. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell. 2011.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
### Load TranscriptionDataSet object data(tds) ### Load reference annotations (knownGene from UCSC) data(annot) ### Estimate gap distance minimazing error rate ### Define the range of gap distances to test gdr <- seq(from = 0, to = 10000, by = 1000) estimateGapDistance(object = tds, annot = annot, coverage.cutoff = 5, filter.annot = FALSE, gap.dist.range = gdr) ### View estimated gap distance tds