generateJunctionModels: Generate putative transcript-models derived from...

View source: R/method_generateJunctionModels.R

generateJunctionModelsR Documentation

Generate putative transcript-models derived from splice-junctions.

Description

Generates splicing-isoforms using the supplied splice-junctions (SJ) within the ProteoDiscography.

Supplied junctions will be transformed into novel gene-models based on the nearest (or overlapping) exon with the TxDb of the given ProteoDiscography; strand-information is taken into account. If no strand-information is given, the nearest (or overlapping) known exon is assigned.

It will derive the putative gene-model based on the assigned exons as annotated within the TxDb. E.g., if matched with the second exon of geneA (+) and the fourth exon of geneB (+) it will generate the following gene-model:

geneA-Exon1, geneA-Exon2, geneB-Exon4, geneB-Exon5, ...

Users can also specify the max. search-window (in bp) in which the nearest canonical exonic boundary should fall.

Usage

generateJunctionModels(
  ProteoDiscography,
  maxDistance = 150,
  maxCrypticSize = 75,
  skipCanonical = TRUE,
  threads = 1
)

Arguments

ProteoDiscography

(ProteoDiscography): ProteoDiscography object which stores the annotation and genomic sequences.

maxDistance

(integer): Max. distance (>=0 bp) from splice-junction to nearest (or overlapping) exon in bp. Setting this to high numbers will (erroneously) assign a distant exon as 'neighboring' resulting in unlikely models. If the SJ is beyond this distance, it will be assigned as an cryptic exon and the length of this exon is set with the [maxCrypticSize] parameter.

maxCrypticSize

(integer): The max. extension of bp (respective to orientation) beyond the SJ if it is assigned as a cryptic exon. I.e., the nr. of bp that will be used in determining the putative transcript sequence for each cryptic junction.

skipCanonical

(logical): Should canonical exon-exon junctions be skipped (TRUE) or generated (FALSE)?

threads

(integer): Number of threads.

Value

ProteoDiscography with derived splice-isoforms.

Author(s)

Job van Riet j.vanriet@erasmusmc.nl

Wesley van de Geer w.vandegeer@erasmusmc.nl

Examples


 ProteoDiscography.hg19 <- ProteoDisco::generateProteoDiscography(
   TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene,
   genomeSeqs = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
 )

 # Import splice-junctions (even spanning different chromosomes) based on our format.
 testSJ <- readr::read_tsv(system.file('extdata', 'validationSetSJ_hg19.txt', package = 'ProteoDisco'))

 # Add custom SJ to ProteoDiscography.
 ProteoDiscography.hg19 <- ProteoDisco::importSpliceJunctions(
   ProteoDiscography = ProteoDiscography.hg19,
   inputSpliceJunctions = testSJ
 )

 # Generate junction-models from non-canonical splice-junctions.
 ProteoDiscography.hg19 <- ProteoDisco::generateJunctionModels(
   ProteoDiscography = ProteoDiscography.hg19,
   # Max. distance from a known exon-boundary before introducing a novel exon.
   # If an adjacent exon is found within this distance, it will shorten or elongate that exon towards the SJ.
   maxDistance = 150,
   # Should we skip known exon-exon junctions (in which both the acceptor and donor are located on known adjacent exons within the same transcript)
   skipCanonical = TRUE,
   # Perform on multiple threads (optional)
   threads = 1
 )


ErasmusMC-CCBC/ProteoDisco documentation built on Dec. 9, 2022, 8:41 a.m.