call_transcripts_and_genes: Call transcript and gene models from corrected full-length...

View source: R/Call_transcript.R

call_transcripts_and_genesR Documentation

Call transcript and gene models from corrected full-length RNA-seq reads

Description

Call transcript and gene models from corrected full-length RNA-seq reads

Usage

call_transcripts_and_genes(
  long_reads,
  skip_minor_tx = 0.01,
  max_overlap_called = 0.1,
  min_read_width = 1000,
  min_overlap_fusion = 0.5,
  clust_threshold = 0.8
)

Arguments

long_reads

GRangesList object.

skip_minor_tx

Numeric in the range (0, 1), or NULL.

max_overlap_called

Numeric in the range [0, 1).

min_read_width

Positive integer.

min_overlap_fusion

Numeric in the range (0, 1].

clust_threshold

Numeric in the range (0, 1].

Value

List of length 5:

  1. GRanges object (HC, MC and LC genes);

  2. GRangesList object (HC, MC and LC transcripts);

  3. GRanges object (fusion genes);

  4. GRangesList object (fusion transcripts);

  5. GRangesList object (unused long reads outside of the called genes and transcripts).

Details

The input GRangesList object is returned by the detect_alignment_errors() function.
Long reads either marked as truncated by extend_long_reads_to_TSS_and_PAS(), or containing a misaligned exon (as revealed by detect_alignment_errors()), are skipped from the transcript calling procedure. The remaining long reads are collapsed into transcripts. The transcripts are classified into high confidence (HC), medium confidence (MC) and low confidence (LC) groups:

  • HC transcripts are called from reads which start in a TSS and end in a PAS;

  • MC transcripts are called from TSS-only or PAS-only reads which do not overlap with any HC transcript by more than max_overlap_called fraction of either read or transcript length;

  • LC transcripts are called from reads which neither start in a TSS nor end in a PAS, and do not overlap with any HC or MC transcript by mode than max_overlap_called.

This iterative procedure of transcript calling ensures that highly expressed HC loci are not contaminated with less reliable MC or LC transcripts. The MC/LC transcripts are not guaranteed to be full-length. To decrease the risk of picking up products of partial RNA degradation, MC and LC transcripts can be called only from reads longer than min_read_length bp.
The called HC, MC and LC transcripts are clustered into HC, MC and LC genes, respectively. A pair of transcripts of the same type having overlap (intersect/union) above the clust_threshold are considered belonging to the same gene.
Within each gene, the minor transcripts (collectively representing up to skip_minor_tx fraction of the reads) are skipped from further consideration. To suppress this behavior, set skip_minor_tx = NULL.
Finally, transcripts which overlap at least two other disjoint transcripts by at least min_overlap_fusion fraction of their lengths, are considered fusion transcripts.


Maxim-Ivanov/TranscriptomeReconstructoR documentation built on Oct. 3, 2023, 11:19 p.m.