predict_jrp_exon: Predict novel exons from read pairs with two splice junctions

Description Usage Arguments Value

Description

Novel exons are predicted from paired-end reads where each read spans one splice junction. First, the read pairs are filtered and the distance between the end of the first junction and the start of the second junction has to be < 2 * (read_length - overhang_min) + min_intron_size. Here, read_length is the length of the reads, overhang_min is the minmal required read overhang over a splice junction of the alignment tool and min_intron_size is the minimal required intron length of the alignment tool. For example, paired-end reads with a lenght of 101 nts and a minimal overhang of 6 and a minimal intron length of 21 allow a distance of at most 211 nucleotides between the two splice junctions: 2 * (101 - 6) + 21 = 211. If the distance between the two splice junctions exceeds the limit, it cannot be guaranteed that the junctions are connected to the same exon. Splice junction pairs that are already annotated in a transcript are removed. Novel exons are predicted from the remaining splice junction pairs.

Usage

1
2
3
4
5
6
7
predict_jrp_exon(
  junc_reads,
  annotation,
  read_length = 101,
  overhang_min = 12,
  min_intron_size = 21
)

Arguments

junc_reads

GAlignments object with junction reads.

annotation

List with exon and intron annotation as GRanges. Created with prepare_annotation().

read_length

Integer scalar. Length of your reads in bps. Default 101.

overhang_min

Integer scalar. Minimum overhang length for splice junctions on both sides as defined by the --outSJfilterOverhangMin parameter of STAR. Use the minimum of the values for canonical splice junctions (value (2) to (4)). You do not have to set this parameter if you used the default values from STAR. Default 12.

min_intron_size

Integer scalar. Minimum intron size (--alignIntronMin parameter of STAR). You do not have to set this parameter if you used the default values from STAR. Default 21.

Value

data.frame with the coordinates of the predicted novel exon. It has 6 columns: seqnames, lend, start, end, rstart and strand.


khembach/DISCERNS documentation built on June 23, 2020, 3:35 p.m.