get_second_sj: Identify the second splice junction of a novel exon

Description Usage Arguments Details Value

View source: R/exon_from_sj.R

Description

This function takes a set of novel splice junctions (SJ) as input and returns a data.frame with the location of the novel exon and the coordinates of the neighbouring exons. There are three different cases: 1) Novel SJs that touch an annotated exon with their start (5' end). 2) Novel SJs that touch annotated exons with their end (3' end). 3) Novel SJs that touch an annoated exon on both ends (5' and 3' end).

Usage

1
get_second_sj(junctions, reads, touching, txdb, gtxdb, ebyTr, cores)

Arguments

junctions

data.frame with novel splice junctions. Following columns are required: seqnames, start, end and strand.

reads

GAlignments object. Reads that contain novel splice junctions, e.g. obtained with import_novel_sj_reads().

touching

Character string. Which end of the novel splice junctions (parameter junctions) touches an annotated exon? One of "both", "start" or "end".

txdb

TxDb object, e.g. the "txdb" slot from the prepare_annotation() return object.

gtxdb

GRanges object. All genes from the txdb parameter, e.g. obtained with GenomicFeatures::genes(txdb).

ebyTr

GRangesList object. All exons per transcript of the txdb parameter, e.g. obtained with GenomicFeatures::exonsBy(txdb, by = "tx", use.names = TRUE).

cores

Integer scalar. Number of cores to use. Default 1.

Details

The three cases can be illustrated as follows:

  1. Start touches annotation

    AAAA            annotated exon
       J---J        novel junction
     xxx---xx----x  read
            X----X  function return
    
  2. End touches annotation

               AAA annotated exon
          J----J   novel junction
    xx---xx----xx  read
     X---X         function return
    
  3. Both ends touch annotation

           AAAAA  annotated exon
    AAAA          annotated exon
       J---J      novel junction
      nn---xxxxx  possible transcript with novel exon nn at the 5' of the SJ
    xxxx---nn     possible transcript with novel exon nn at the 3' of the SJ
    

In case 3, we search for reads with two novel splice junctions that support a novel exon on either end of the SJ. If we do not find such reads, we check if the novel exon could be terminal, i.e. the first or last exon in the transcript. If yes, the novel exon does not have any touching annotated exon on that end and thus we cannot determine the end coordinate of the novel exon. As an approximation, we take the boundaries of the read with the longest mapping to the novel exon.

Value

data.frame with the coordinates of the novel exon. It has 6 columns: seqnames, lend, start, end, rstart and strand.


khembach/DISCERNS documentation built on June 23, 2020, 3:35 p.m.