refine_transcripts_by_annotation: Refine called transcripts by an existing transcript model

View source: R/Refine_by_annotation.R

refine_transcripts_by_annotationR Documentation

Refine called transcripts by an existing transcript model

Description

Refine called transcripts by an existing transcript model

Usage

refine_transcripts_by_annotation(
  hml_tx,
  annot_exons,
  tss,
  pas,
  fusion_tx = GenomicRanges::GRangesList(),
  max_exon_diff = 10,
  tx_flanks_up = c(-100, 100),
  tx_flanks_down = c(-100, 100),
  min_score_2 = 5,
  min_tx_cov = 0.95,
  clust_threshold = 0.8,
  min_overlap_fusion = 0.5
)

Arguments

hml_tx, annot_exons, fusion_tx

GRangesList objects.

tss, pas

GRanges objects.

max_exon_diff

Positive integer.

tx_flanks_up, tx_flanks_down

Integer vectors of length 2.

min_score_2

Non-negative numeric.

min_tx_cov

Numeric in the range (0, 1].

clust_threshold

Numeric in the range (0, 1].

min_overlap_fusion

Numeric in the range (0, 1].

Value

List of length 4:

  1. GRanges object (updated HC, MC and LC genes);

  2. GRangesList object (refined HC, MC and LC transcripts);

  3. GRanges object (updated fusion genes);

  4. GRangesList object (updated fusion transcripts);

Details

hml_tx is the called transcript model (the second element in the list returned by the call_transcripts_and_genes() function). fusion_tx is the called set of fusion transcripts (the fourth element in the list returned by the call_transcripts_and_genes() function). annot_exons is a known transcript model (returned by e.g. exonsBy(txdb, by = "tx"), where txdb is a GenomicFeatures object).
The function aims to adjust the called transcripts by the annotated transcripts:

  • 5'- and 3'-borders of called exons are adjusted to the most similar border of an annotated exon (by not more than max_exon_diff bp);

  • Annotated transcripts are classified into valid and non-valid. A valid known transcript must overlap with called TSS and PAS (both having scores above min_score_2) within tx_flanks_up and tx_flanks_down bp windows around its start and end, respectively;

  • 5'- and/or 3'-borders of called MC and LC transcripts lacking overlap with TSS and/or PAS are adjusted to the borders of the most similar mate among the valid annotated transcripts (given that at least min_tx_cov fraction of the called transcript is covered by the annotated mate);

  • Valid annotated transcripts which do not overlap with any called transcript are copied from the annotation to the called HC transcript set;

In addition, the set of fusion transcripts is updated by finding called transcripts which overlap at least two valid annotated transcripts (or an annotated and a called transcript) by at least min_overlap_fusion fraction of their lengths.


Maxim-Ivanov/TranscriptomeReconstructoR documentation built on Oct. 3, 2023, 11:19 p.m.