importSpliceJunctions: Import splice-junctions into the ProteoDiscograpy.

View source: R/import_spliceJunctions.R

importSpliceJunctionsR Documentation

Import splice-junctions into the ProteoDiscograpy.

Description

Generates putative gene-models based on supplied genomic coordinates of splice-junctions.

Input should be a tibble containing the following columns:

  • junctionA: Genomic coordinates of the 5'-junction. (format: chr:start:strand, i.e.: chr1:100:+)

  • junctionB: Genomic coordinates of the 3'-junction. (format: chr:end:strand, i.e.: chr1:150:+)

  • sample: Names of the samples. (character, optional)

  • identifier: The identifier which will be used in downstream analysis. (character, optional)

Common splice-junction formats (BED and SJ.out.tab (STAR)) can also be supplied and are converted into the correct DataFrame.

By utilizing two separate junction-sites, interchromosomal trans-splicing or chimeric transcripts from genomic fusions (e.g., resulting from the BCR/ABL1 fusion-gene) can also be handled.

Usage

importSpliceJunctions(
  ProteoDiscography,
  inputSpliceJunctions,
  isTopHat = TRUE,
  samples = NULL,
  aggregateSamples = FALSE,
  removeExisting = FALSE,
  overwriteDuplicateSamples = FALSE
)

Arguments

ProteoDiscography

(ProteoDiscography): ProteoDiscography object which stores the annotation and genomic sequences.

inputSpliceJunctions

(tibble): Tibble containing the splice-junctions.

isTopHat

(logical): Are the imported (.BED) files from TopHat? If so, the start-end of the SJ are corrected for max. overhang.

samples

(character): Preferred names for the samples if BED / TAB files are supplied, default is derived from filepath.

aggregateSamples

(logical): Should splice-junctions from multiple samples be aggregated? Or should sample-specific models be generated? If genomic variants are to be incorporated within the derived splice-transcripts, the names of samples need to be match.

removeExisting

(logical): Should existing entries be removed?

overwriteDuplicateSamples

(logical): Should duplicate samples be overwritten?

Value

ProteoDiscography with imported splice-junctions.

Author(s)

Job van Riet j.vanriet@erasmusmc.nl

Wesley van de Geer w.vandegeer@erasmusmc.nl

Examples


 ProteoDiscography.hg19 <- ProteoDisco::generateProteoDiscography(
   TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene,
   genomeSeqs = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
 )

 # Import from file.
 ProteoDiscography.hg19 <- ProteoDisco::importSpliceJunctions(
   ProteoDiscography = ProteoDiscography.hg19,
   inputSpliceJunctions = system.file('extdata', 'spliceJunctions_pyQUILTS_chr22.bed', package = 'ProteoDisco'),
   # (Optional) Rename samples.
   samples = 'pyQUILTS',
   # Specify that the given BED files are obtained from TopHat.
   # Chromosomal coordinates from TopHat require additional formatting.
   isTopHat = TRUE,
 )

 # Or, import splice-junctions (even spanning different chromosomes) based on our format.
 testSJ <- readr::read_tsv(system.file('extdata', 'validationSetSJ_hg19.txt', package = 'ProteoDisco'))

 # Add custom SJ to ProteoDiscography.
 ProteoDiscography.hg19 <- ProteoDisco::importSpliceJunctions(
   ProteoDiscography = ProteoDiscography.hg19,
   inputSpliceJunctions = testSJ,
   # Append to existing SJ-input.
   removeExisting = FALSE
 )


ErasmusMC-CCBC/ProteoDisco documentation built on Dec. 9, 2022, 8:41 a.m.