process_nascent_intervals: Classify intervals of nascent transcription and add them to...

process_nascent_intervalsR Documentation

Classify intervals of nascent transcription and add them to the gene model

Description

Classify intervals of nascent transcription and add them to the gene model

Usage

process_nascent_intervals(
  hml_genes,
  nascent,
  tss,
  pas,
  reads_free = NULL,
  gaps = NULL,
  trim_offset = 20,
  min_score_2 = 5,
  min_lncrna_width = 500,
  extend_along_nascent = TRUE,
  extension_flanks = c(tss = -50, pas = 0)
)

Arguments

hml_genes, nascent, tss, pas

GRanges objects.

reads_free

GrangesList object, or NULL.

gaps

GRanges object, or NULL.

trim_offset

Non-negative integer.

min_score_2

Non-negative numeric.

min_lncrna_width

Positive integer.

extend_along_nascent

Logical.

extension_flanks

Integer vector of length 2.

Value

List of length 3:

  1. GRanges object which contains intervals covered by mature RNA molecules (i.e. the original coordinates of HC, MC and LC genes, with the exception that the borders of MC and LC genes could have been extended towards nearby strong TSS and/or PAS).

  2. GRanges object which contains HC, MC and LC genes extended to include the gene-associated intervals of nascent transcription (the interval of mature transcription moved to the "thick" mcols).

  3. GRanges object which contains lncRNAs called from the nascent-only transcribed intervals.

Details

hml_genes contains the called genes (the first element in the list returned by the call_transcripts_and_genes() function). nascent contains continuous intervals of nascent transcription (the first element in the list returned by the call_transcribed_intervals() function). tss and pas are returned by call_TCs() on 5'- and 3'-tag sequencing data, respectively. gaps contains intervals of low coverage within the continuous intervals of nascent transcription (the second element in the list returned by the call_transcribed_intervals() function). /codereads_free contains long reads which remain unused during the transcript calling procedure and are located outside of the called genes (the fifth element in the list returned by the call_transcripts_and_genes() function).
The intervals of nascent transcription are first classified by overlap with the called genes (on the same strand). The intervals which start upstream and/or end downstream from a called genes, are considered upstream transcribed intervals and readthrough (RT) tails, respectively. Borders of the called genes are extended to include such gene-associated intervals of nascent transcription. At that, the original gene coordinates (which correspond to the mature RNA molecule) are saved as the "thick" mcols. If extend_along_nascent == TRUE, then the original ("thick") coordinates of MC and LC genes can be further extended towards strong TSS and/or PAS (with scores not less than min_score_2) which are found within extension_flanks windows centered at starts and ends, respectively, of the associated nascent interval.
Other nascent intervals which are not associated with any mature transcript, are considered antisense or intergenic lncRNAs. They are filtered to have length not less than min_lncrna_width.


Maxim-Ivanov/TranscriptomeReconstructoR documentation built on Oct. 3, 2023, 11:19 p.m.