junction_process: Processing junctions
In dzhang32/dasper: Detecting abberant splicing events from RNA-sequencing data

View source: R/junction_process.R

junction_annot

R Documentation

Processing junctions

Description

The set of functions prefixed with "junction_" are used to process junction data. They are designed to be run in a sequential manner in the order junction_annot, junction_filter, junction_norm, junction_score. Or, alternatively the wrapper function junction_process can be used to run all 4 of the functions stated above in one go. For more details of the individual functions, see "Details".

Usage

junction_annot(
  junctions,
  ref,
  ref_cols = c("gene_id", "tx_name", "exon_id"),
  ref_cols_to_merge = c("gene_id")
)

junction_filter(
  junctions,
  count_thresh = c(raw = 5),
  n_samp = c(raw = 1),
  width_range = NULL,
  types = NULL,
  regions = NULL
)

junction_norm(junctions)

junction_process(
  junctions,
  ref,
  ref_cols = c("gene_id", "tx_name", "exon_name"),
  ref_cols_to_merge = c("gene_id"),
  count_thresh = c(raw = 5),
  n_samp = c(raw = 1),
  width_range = NULL,
  types = NULL,
  regions = NULL,
  score_func = .zscore,
  ...
)

junction_score(junctions, score_func = .zscore, ...)

Arguments

`junctions`	junction data as a RangedSummarizedExperiment-class object.
`ref`	either path to gtf/gff3 or object of class TxDb-class or EnsDb-class. EnsDb-class is required if you intend to annotate junctions with gene symbols/names.
`ref_cols`	character vector listing the names of the columns in `ref` for which to annotate junctions with. Must contain "gene_id", used for categorising junctions.
`ref_cols_to_merge`	character vector listing which of the annotation columns `ref_cols` should be merged into in columns to merge into a single column per junction. Must contain "gene_id", used for categorising junctions.
`count_thresh`	named vector with names matching the names of the assays in `junctions`. Values denote the number of counts below which a junction will be filtered out.
`n_samp`	named vector with names matching the names of the assays in `junctions`. Values denotes number of samples that have to express the junction above the `count_thresh` in order for that junction to not be filtered.
`width_range`	numeric vector of length 2. The first element denoting the lower limit of junction width and the second the upper limit. Junctions with widths outside this range will be filtered out.
`types`	any junctions matching these types, derived form junction_annot will be filtered out.
`regions`	any junctions overlapping this set of regions (in a GRanges-class format) will be filtered out.
`score_func`	function to score junctions by their abnormality. By default, will use a z-score but can be switched to a user-defined function. This function must take as input an `x` and `y` argument, containing case and control counts respectively. This must return a numeric vector equal to the length of `x` with elements corresponding to a abnormality of each junction.
`...`	additional arguments passed to `score_func`.

Details

junction_process wraps all "junction_" prefixed functions in dasper except junction_load. This is designed to simplify processing of the junction data for those familiar or uninterested with the intermediates.

junction_annot annotates junctions by 1. whether their start and/or end position precisely overlaps with an annotated exon boundary and 2. whether that junction matches an intron definition from existing annotation. Using this information along with the strand, junctions are categorised into "annotated", "novel_acceptor", "novel_donor", "novel_combo", "novel_exon_skip", "ambig_gene" and "unannotated".

junction_filter filters out "noisy" junctions based on counts, the width of junctions, annotation category of the junction returned from junction_annot and whether the junction overlaps with a set of (blacklist) regions.

junction_norm normalises the raw junction counts by 1. building junction clusters by finding junctions that share an acceptor or donor position and 2. calculating a proportion-spliced-in (PSI) for each junction by dividing the raw junction count by the total number of counts in it's associated cluster.

junction_score will use the counts contained within the "norm" assay to calculate a deviation of each patient junction from the expected distribution of control junction counts. The function used to calculate this abnormality score can be user-inputted or left as the default z-score. Junctions will also be labelled based on whether they are up-regulated (+1) or down-regulated (-1) with respect to controls junction and this information is stored in the assay "direction" for use in outlier_aggregate.

Value

RangedSummarizedExperiment-class object containing filtered, annotated, normalised junction data with abnormality scores.

Functions

junction_annot: Annotate junctions using reference annotation
junction_filter: Filter junctions by count, width, annotation or region
junction_norm: Normalise junction counts by cluster
junction_score: Score patient junctions by their abnormality

Examples


##### Set up txdb #####

# use GenomicState to load txdb (GENCODE v31)
ref <- GenomicState::GenomicStateHub(
    version = "31",
    genome = "hg38",
    filetype = "TxDb"
)[[1]]

##### junction_annot #####

junctions <- junction_annot(junctions_example, ref)

##### junction_filter #####

junctions <- junction_filter(
    junctions,
    types = c("ambig_gene", "unannotated")
)

##### junction_norm #####

junctions <- junction_norm(junctions)

##### junction_score #####

junctions <- junction_score(junctions)

##### junction_process #####

junctions_processed <- junction_process(
    junctions_example,
    ref,
    types = c("ambig_gene", "unannotated")
)

# the two objects are equivalent
all.equal(junctions_processed, junctions, check.attributes = FALSE)

dzhang32/dasper documentation built on Dec. 14, 2024, 8:33 p.m.