View source: R/Call_transcript.R
call_transcripts_and_genes | R Documentation |
Call transcript and gene models from corrected full-length RNA-seq reads
call_transcripts_and_genes(
long_reads,
skip_minor_tx = 0.01,
max_overlap_called = 0.1,
min_read_width = 1000,
min_overlap_fusion = 0.5,
clust_threshold = 0.8
)
long_reads |
|
skip_minor_tx |
Numeric in the range (0, 1), or NULL. |
max_overlap_called |
Numeric in the range [0, 1). |
min_read_width |
Positive integer. |
min_overlap_fusion |
Numeric in the range (0, 1]. |
clust_threshold |
Numeric in the range (0, 1]. |
List of length 5:
GRanges
object (HC, MC and LC genes);
GRangesList
object (HC, MC and LC transcripts);
GRanges
object (fusion genes);
GRangesList
object (fusion transcripts);
GRangesList
object (unused long reads outside of the called genes and transcripts).
The input GRangesList
object is returned by the detect_alignment_errors()
function.
Long reads either marked as truncated by extend_long_reads_to_TSS_and_PAS()
, or containing a misaligned exon (as revealed by detect_alignment_errors()
), are skipped from the transcript calling procedure.
The remaining long reads are collapsed into transcripts. The transcripts are classified into high confidence (HC), medium confidence (MC) and low confidence (LC) groups:
HC transcripts are called from reads which start in a TSS and end in a PAS;
MC transcripts are called from TSS-only or PAS-only reads which do not overlap with any HC transcript by more than max_overlap_called
fraction of either read or transcript length;
LC transcripts are called from reads which neither start in a TSS nor end in a PAS, and do not overlap with any HC or MC transcript by mode than max_overlap_called
.
This iterative procedure of transcript calling ensures that highly expressed HC loci are not contaminated with less reliable MC or LC transcripts.
The MC/LC transcripts are not guaranteed to be full-length. To decrease the risk of picking up products of partial RNA degradation, MC and LC transcripts
can be called only from reads longer than min_read_length
bp.
The called HC, MC and LC transcripts are clustered into HC, MC and LC genes, respectively. A pair of transcripts of the same type having overlap (intersect/union) above the clust_threshold
are considered belonging to the same gene.
Within each gene, the minor transcripts (collectively representing up to skip_minor_tx
fraction of the reads) are skipped from further consideration. To suppress this behavior, set skip_minor_tx = NULL
.
Finally, transcripts which overlap at least two other disjoint transcripts by at least min_overlap_fusion
fraction of their lengths, are considered fusion transcripts.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.