sashimiDataConstants: Prepare sashimi plot required data

sashimiDataConstantsR Documentation

Prepare sashimi plot required data

Description

Prepare sashimi plot required data, deriving data objects as needed

Usage

sashimiDataConstants(
  gtf = NULL,
  txdb = NULL,
  tx2geneDF = NULL,
  exonsByTx = NULL,
  cdsByTx = NULL,
  detectedTx = NULL,
  detectedGenes = NULL,
  flatExonsByGene = NULL,
  flatExonsByTx = NULL,
  envir = NULL,
  empty_uses_farrisdata = TRUE,
  use_memoise = TRUE,
  verbose = FALSE,
  ...
)

Arguments

gtf, txdb, tx2geneDF, exonsByTx, cdsByTx

objects used to define the overall set of genes, transcripts, and associated exons and CDS exons. See this function description for more detail.

detectedTx, detectedGenes, flatExonsByGene, flatExonsByTx

objects used to derive a specific subset of gene-exon models using only detected transcripts or genes. See this function description for more detail.

envir

environment where data will be prepared, or when envir=NULL a new environment will be created and returned.

empty_uses_farrisdata

logical indicating whether to use data from the Github R package "jmw86069/farrisdata" if no data is supplied to this function. This behavior is intended to make it easy to use farrisdata to recreate the Sashimi plots in that publication.

use_memoise

logical indicating whether to use memoise to cache intermediate data files for exons, flattened exons, transcript-gene data, and so on. This mechanism reduces time to render sashimi plots that re-use the same gene. All memoise cache folders are named with "_memoise".

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

default_gene

character string indicating the default gene to use for the initial R-shiny figure.

Details

This function performs a subset of steps performed by sashimiAppConstants(), focusing only on data required for gene-exon structure. The sashimiAppConstants() defines color_sub and validates filesDF, then calls this function sashimiDataConstants() to prepare and validate the gene-exon data.

Data derived by this function sashimiDataConstants():

  • txdb: TranscriptDb object used to derive exonsByTx and cdsByTx if either object does not already exist. If txdb is not supplied, it is derived from gtf using GenomicFeatures::makeTxDbFromGFF().

  • tx2geneDF: data.frame with colnames: "transcript_id" and "gene_name".

  • gtf: character path to a GTF/GFF/GFF3 file, suitable for GenomicFeatures::makeTxDbFromGFF(). The gtf is only used if tx2geneDF or exonsByTx are not supplied. Note that when gtf points to a remote server, the file is copied to the current working directory for more rapid use. If the file already exists in the local directory, it is re-used.

  • exonsByTx: GRangesList object, named by "transcript_id", containing all exons for each transcript. It is derived from txdb if not supplied; and names should match tx2geneDF$transcript_id.

  • cdsByTx: GRangesList object, named by "transcript_id", containing only CDS (protein-coding) exons for each transcript. It is derived from txdb if not supplied; and names should match tx2geneDF$transcript_id.

  • detectedTx: character vector of tx2geneDF$transcript_id values, representing a subset of transcripts detected above background. See definedDetectedTx() for one strategy to define detected transcripts. If detectedTx does not exist, it is defined by all transcripts present in tx2geneDF$transcript_id. Note this step can be the rate-limiting step in the preparation of flatExonsByTx.

  • detectedGenes: character vector of values that match tx2geneDF$gene_name. If it is not supplied, it is inferred from detectedTx and tx2geneDF$transcript_id.

  • flatExonsByGene: GRangesList object containing non-overlapping exons for each gene, whose names match tx2geneDF$gene_name. If not supplied, it is derived using flattenExonsBy() and objects exonsByTx, cdsByTx, detectedTx, and tx2geneDF. This step is the key step for using a subset of detected transcripts, in order to produce a clean gene-exon model.

  • flatExonsByTx: GRangesList object containing non-overlapping exons for each transcript. If not supplied, it is derived using flattenExonsBy() and objects exonsByTx, cdsByTx, detectedTx, and tx2geneDF. This step is the key step for using a subset of detected transcripts, in order to produce a clean transcript-exon model.

When use_memoise=TRUE several R objects are cached using memoise::memoise(), to help re-use of prepared R objects, and to help speed the re-use of data within the R-shiny app:

Value

environment that contains the required data objects for splicejam sashimi plots. Note that the environment itself is updated during processing, so the environment does not need to be returned for the data contained inside it to be updated by this function.

See Also

Other splicejam R-shiny functions: launchSashimiApp(), sashimiAppConstants(), sashimiAppUI()


jmw86069/splicejam documentation built on Nov. 4, 2024, 10:53 a.m.