detectTranscripts

Description

The function dissects transcribed regions (transcripts) genome-wide and performs expression level quantification.

Usage

1
2
3
4
5
6
7
detectTranscripts(object, coverage.cutoff, gap.dist, estimate.params = TRUE,
  total.reads, combine.by.annot = FALSE, annot)

## S4 method for signature 'TranscriptionDataSet'
detectTranscripts(object, coverage.cutoff,
  gap.dist, estimate.params = TRUE, total.reads, combine.by.annot = FALSE,
  annot)

Arguments

object

A TranscriptionDataSet object.

coverage.cutoff

Numeric. A cutoff value to discard regions with the low fragments coverage, representing expression noise. By default, the value stored in the coverageCutoff slot of the supplied TranscriptionDataSet object is used. The optimal cutoff value can be calculated by estimateBackground function call.

gap.dist

Numeric. Maximum allowed distance between transcribed regions to be merged into the one transcript. By default, the value stored in the gapDistanceTest slot of the supplied TranscriptionDataSet object is used. The optimal gap distance can be calculated by estimateGapDistance function call.

estimate.params

Logical. Whether to estimate expression level and coverage density of the detected transcripts. Default: TRUE.

total.reads

Numeric. Total number of reads used for the normalization, when calculating FPKM. By default, the total number of reads stored in the provided TranscriptionDataSet object is used.

combine.by.annot

Logical. Whether to combine transcripts overlapping the same reference annotation. Default: FALSE.

annot

GRanges. Reference annotations.

Details

The function uses two parameters to identify transcribed regions: coverage.cutoff and gap.dist as calculated by the estimateBackground and estimateGapDistance, respectively and stored in the TranscriptionDataSet object. Alternatively, the user may specify his/her own values to be passed to the function. By increasing the gap.dist, fewer transcripts of longer size will be identified, and an increase in the coverage.cutoff will result in fewer transcripts of shorter size (a typical transcript tends to have a lower fragments coverage at the 3' end, and thus, the coverage.cutoff value will have an impact on the resulting length of the detected transcript).

If estimate.params is set TRUE, the following metrics are estimated for each transcript:

  • length - transcript length (in base pairs).

  • bases.covered - the number of bases covered by the sequencing fragments.

  • coverage - the proportion of transcript length covered by fragments. Value in the range (0, 1].

  • fragments - total number of fragments per transcript.

  • fpkm - Fragments Per Kilobase of transcript per Million mapped reads.

The coverage is a measure of how densely the transcript is covered by the sequencing fragments. Modestly/highly expressed transcripts will have a value close to 1, whereas lowly expressed transcripts will have a value close to 0, indicating the sparse distribution of sequencing fragments along the transcript body.

Value

The slot transcripts of the provided TranscriptionDataSet object will be updated by the GRanges object, containing detected transcripts and, if estimated, corresponding expression levels.

Author(s)

Armen R. Karapetyan

See Also

constructTDS

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
### Load TranscriptionDataSet object
data(tds)

### Load reference annotations (knownGene from UCSC)
data(annot)

### Detect transcripts
detectTranscripts(object = tds, coverage.cutoff = 5, gap.dist = 4000,
estimate.params = TRUE, combine.by.annot = FALSE, annot = annot)

### View detected transcripts
getTranscripts(tds)