sashimiDataConstants: Prepare sashimi plot required data
In jmw86069/jambio: Analysis and Visualization of Gene Splice Variants and Transcriptome Data

sashimiDataConstants

R Documentation

Prepare sashimi plot required data

Description

Prepare sashimi plot required data, deriving data objects as needed

Usage

sashimiDataConstants(
  gtf = NULL,
  txdb = NULL,
  tx2geneDF = NULL,
  exonsByTx = NULL,
  cdsByTx = NULL,
  detectedTx = NULL,
  detectedGenes = NULL,
  flatExonsByGene = NULL,
  flatExonsByTx = NULL,
  envir = NULL,
  empty_uses_farrisdata = TRUE,
  use_memoise = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`gtf`, `txdb`, `tx2geneDF`, `exonsByTx`, `cdsByTx`	objects used to define the overall set of genes, transcripts, and associated exons and CDS exons. See this function description for more detail. Notes: `gtf` can be a local file, in which case it will be loaded from its current path without copying to the current directory, as long as `file.exists(gtf)` is `TRUE`, which also means it would not have the prefix `"file://"`. However, when it has the prefix `"file://"` it will use `curl::curl_download()` which will copy it to the current directory. In either case, when other files are derived, such as `tx2gene` or `txdb`, those files are stored in the current directory. `tx2geneDF` is a `data.frame` with at minimum two columns: `gene_name` and `transcript_id`. It is possible to customize these column names, however it is easiest to use these defaults. `exonsByTx` and `cdsByTx` are `GRangesList` objects, and they are used to derive `flatExonsByTx` and `flatExonsByGene` when those objects are not already provided. Also, `exonsByTx` and `cdsByTx` are derived using `gtf` or `txdb` when necessary to derive `flatExonsByTx` and `flatExonsByGene`.
`detectedTx`, `detectedGenes`, `flatExonsByGene`, `flatExonsByTx`	objects used to derive a specific subset of gene-exon models using only detected transcripts or genes. See this function description for more detail. `detectedTx` and `detectedGenes` are `character` vectors. `flatExonsByTx` and `flatExonsByGene` are `GRangesList`, where each `GRanges` element contains disjoint (non-overlapping) ranges. When not provided, they are derived from `exonsByTx` and `cdsByTx`, which also requires `tx2geneDF` and either `gtf` or `txdb`.
`envir`	`environment` where data will be prepared, or when `envir=NULL` a new environment will be created and returned.
`empty_uses_farrisdata`	`logical` indicating whether to use data from the Github R package `"jmw86069/farrisdata"` if no data is supplied to this function. This behavior is intended to make it easy to use farrisdata to recreate the Sashimi plots in that publication.
`use_memoise`	`logical` indicating whether to use `memoise` to cache intermediate data files for exons, flattened exons, transcript-gene data, and so on. This mechanism reduces time to render sashimi plots that re-use the same gene. All memoise cache folders are named with `"_memoise"`.
`verbose`	`logical` indicating whether to print verbose output.
`...`	additional arguments are ignored.
`default_gene`	`character` string indicating the default gene to use for the initial R-shiny figure.

Details

This function performs a subset of steps performed by sashimiAppConstants(), focusing only on data required for gene-exon structure. The sashimiAppConstants() defines color_sub and validates filesDF, then calls this function sashimiDataConstants() to prepare and validate the gene-exon data.

Data derived by this function sashimiDataConstants():

txdb: TranscriptDb object used to derive exonsByTx and cdsByTx if either object does not already exist. If txdb is not supplied, it is derived from gtf using GenomicFeatures::makeTxDbFromGFF().
tx2geneDF: data.frame with colnames: "transcript_id" and "gene_name".
gtf: character path to a GTF/GFF/GFF3 file, suitable for GenomicFeatures::makeTxDbFromGFF(). The gtf is only used if tx2geneDF or exonsByTx are not supplied. Note that when gtf points to a remote server, the file is copied to the current working directory for more rapid use. If the file already exists in the local directory, it is re-used.
exonsByTx: GRangesList object, named by "transcript_id", containing all exons for each transcript. It is derived from txdb if not supplied; and names should match tx2geneDF$transcript_id.
cdsByTx: GRangesList object, named by "transcript_id", containing only CDS (protein-coding) exons for each transcript. It is derived from txdb if not supplied; and names should match tx2geneDF$transcript_id.
detectedTx: character vector of tx2geneDF$transcript_id values, representing a subset of transcripts detected above background. See definedDetectedTx() for one strategy to define detected transcripts. If detectedTx does not exist, it is defined by all transcripts present in tx2geneDF$transcript_id. Note this step can be the rate-limiting step in the preparation of flatExonsByTx.
detectedGenes: character vector of values that match tx2geneDF$gene_name. If it is not supplied, it is inferred from detectedTx and tx2geneDF$transcript_id.
flatExonsByGene: GRangesList object containing non-overlapping exons for each gene, whose names match tx2geneDF$gene_name. If not supplied, it is derived using flattenExonsBy() and objects exonsByTx, cdsByTx, detectedTx, and tx2geneDF. This step is the key step for using a subset of detected transcripts, in order to produce a clean gene-exon model.
flatExonsByTx: GRangesList object containing non-overlapping exons for each transcript. If not supplied, it is derived using flattenExonsBy() and objects exonsByTx, cdsByTx, detectedTx, and tx2geneDF. This step is the key step for using a subset of detected transcripts, in order to produce a clean transcript-exon model.

When use_memoise=TRUE several R objects are cached using memoise::memoise(), to help re-use of prepared R objects, and to help speed the re-use of data within the R-shiny app:

Value

environment that contains the required data objects for splicejam sashimi plots. Note that the environment itself is updated during processing, so the environment does not need to be returned for the data contained inside it to be updated by this function.

jmw86069/jambio
Analysis and Visualization of Gene Splice Variants and Transcriptome Data

sashimiDataConstants: Prepare sashimi plot required data
In jmw86069/jambio: Analysis and Visualization of Gene Splice Variants and Transcriptome Data

Prepare sashimi plot required data

Description

Usage

Arguments

Details

Value

See Also

Related to sashimiDataConstants in jmw86069/jambio...

R Package Documentation

Browse R Packages

We want your feedback!

jmw86069/jambio Analysis and Visualization of Gene Splice Variants and Transcriptome Data

sashimiDataConstants: Prepare sashimi plot required data In jmw86069/jambio: Analysis and Visualization of Gene Splice Variants and Transcriptome Data

Prepare sashimi plot required data

Description

Usage

Arguments

Details

Value

See Also

Related to sashimiDataConstants in jmw86069/jambio...

R Package Documentation

Browse R Packages

We want your feedback!

jmw86069/jambio
Analysis and Visualization of Gene Splice Variants and Transcriptome Data

sashimiDataConstants: Prepare sashimi plot required data
In jmw86069/jambio: Analysis and Visualization of Gene Splice Variants and Transcriptome Data