FRASER: FRASER: Find RAre Splicing Events in RNA-seq data

View source: R/Fraser-pipeline.R

FRASERR Documentation

FRASER: Find RAre Splicing Events in RNA-seq data

Description

This help page describes the FRASER function which can be used run the default FRASER pipeline. This pipeline combines the beta-binomial fit, the computation of Z scores and p values as well as the computation of delta-PSI values.

Usage

FRASER(
  fds,
  q,
  type = fitMetrics(fds),
  implementation = c("PCA", "PCA-BB-Decoder", "AE-weighted", "AE", "BB"),
  iterations = 15,
  BPPARAM = bpparam(),
  correction,
  subsets = NULL,
  ...
)

calculateZscore(fds, type = currentType(fds), logit = TRUE)

calculatePvalues(
  fds,
  type = currentType(fds),
  implementation = "PCA",
  BPPARAM = bpparam(),
  distributions = c("betabinomial"),
  capN = 5 * 1e+05
)

calculatePadjValues(
  fds,
  type = currentType(fds),
  method = "BY",
  rhoCutoff = NA,
  geneLevel = TRUE,
  geneColumn = "hgnc_symbol",
  subsets = NULL,
  BPPARAM = bpparam()
)

calculatePadjValuesOnSubset(
  fds,
  genesToTest,
  subsetName,
  type = currentType(fds),
  method = "BY",
  geneColumn = "hgnc_symbol",
  BPPARAM = bpparam()
)

Arguments

fds

A FraserDataSet object

q

The encoding dimensions to be used during the fitting proceadure. Should be fitted using optimHyperParams if unknown. If a named vector is provided it is used for the different splicing types.

type

The type of PSI (jaccard, psi5, psi3 or theta for theta/splicing efficiency)

implementation

The method that should be used to correct for confounders.

iterations

The maximal number of iterations. When the autoencoder has not yet converged after these number of iterations, the fit stops anyway.

BPPARAM

A BiocParallel object to run the computation in parallel

correction

Deprecated. The name changed to implementation.

subsets

A named list of named lists specifying any number of gene subsets (can differ per sample). For each subset, FDR correction will be limited to genes in the subset, and the FDR corrected pvalues stored as an assay in the fds object in addition to the transcriptome-wide FDR corrected pvalues. See the examples for how to use this argument.

...

Additional parameters passed on to the internal fit function

logit

Indicates if z scores are computed on the logit scale (default) or in the natural (psi) scale.

distributions

The distribution based on which the p-values are calculated. Possible are beta-binomial, binomial and normal.

capN

Counts are capped at this value to speed up the p-value calculation

method

The p.adjust method that should be used for genome-wide multiple testing correction.

rhoCutoff

The cutoff value on the fitted rho value (overdispersion parameter of the betabinomial) above which junctions are masked with NA during p value adjustment (default: NA, no masking).

geneLevel

Logical value indiciating whether gene-level p values should be calculated. Defaults to TRUE.

geneColumn

The column name of the column that has the gene annotation that will be used for gene-level pvalue computation.

genesToTest

A named list with the subset of genes to test per sample. The names must correspond to the sampleIDs in the given fds object.

subsetName

The name under which the resulting FDR corrected pvalues will be stored in metadata(fds).

Details

All computed values are returned as an FraserDataSet object. To have more control over each analysis step, one can call each function separately.

  • fit to control for confounding effects and fit the beta binomial model parameters

  • calculatePvalues to calculate the nominal p values

  • calculatePadjValues to calculate adjusted p values (per sample)

  • calculateZscore to calculate the Z scores

Available methods to correct for the confounders are currently: a denoising autoencoder with a BB loss ("AE" and "AE-weighted"), PCA ("PCA"), a hybrid approach where PCA is used to fit the latent space and then the decoder of the autoencoder is fit using the BB loss ("PCA-BB-Decoder"). Although not recommended, it is also possible to directly fit the BB distrbution to the raw counts ("BB").

Value

FraserDataSet

Functions

  • FRASER(): This function runs the default FRASER pipeline combining the beta-binomial fit, the computation of Z scores and p values as well as the computation of delta-PSI values.

  • calculateZscore(): This function calculates z-scores based on the observed and expected logit psi.

  • calculatePvalues(): This function calculates two-sided p-values based on the beta-binomial distribution (or binomial or normal if desired). The returned p values are not yet adjusted with Holm's method per donor or acceptor site, respectively.

  • calculatePadjValues(): This function adjusts the previously calculated p-values per sample for multiple testing. First, the previoulsy calculated junction-level p values are adjusted with Holm's method per donor or acceptor site, respectively. Then, if gene symbols have been annotated to junctions (and not otherwise requested), gene-level p values are computed.

  • calculatePadjValuesOnSubset(): This function does FDR correction only for all junctions in a certain subset of genes which can differ per sample. Requires gene symbols to have been annotated to junctions. As with the full FDR correction across all junctions, first the previously calculated junction-level p values are adjusted with Holm's method per donor or acceptor site, respectively. Then, gene-level p values are computed.

Author(s)

Christian Mertes mertes@in.tum.de

Ines Scheller scheller@in.tum.de

See Also

fit

Examples

# set default parallel backend
register(SerialParam())

# preprocessing
fds <- createTestFraserDataSet()

# filtering not expressed introns
fds <- calculatePSIValues(fds)
fds <- filterExpressionAndVariability(fds)

# Run the full analysis pipeline: fits distribution and computes p values
fds <- FRASER(fds, q=2, implementation="PCA")

# afterwards, the fitted fds-object can be saved and results can 
# be extracted and visualized, see ?saveFraserDataSet, ?results and 
# ?plotVolcano
 
# The functions run inside the FRASER function can also be directly 
# run themselves. 
# To directly run the fit function:
fds <- fit(fds, implementation="PCA", q=2, type="jaccard")

# To directly run the nomial and adjusted p value and z score 
# calculation, the following functions can be used:
fds <- calculatePvalues(fds, type="jaccard")
head(pVals(fds, type="jaccard"))
fds <- calculatePadjValues(fds, type="jaccard", method="BY")
head(padjVals(fds, type="jaccard"))
fds <- calculateZscore(fds, type="jaccard")
head(zScores(fds, type="jaccard")) 

# example of restricting FDR correction to subsets of genes of interest
genesOfInterest <- list("sample1"=c("TIMMDC1"), "sample2"=c("MCOLN1"))
fds <- calculatePadjValues(fds, type="jaccard", 
                 subsets=list("exampleSubset"=genesOfInterest))
padjVals(fds, type="jaccard", subsetName="exampleSubset")
padjVals(fds, type="jaccard", level="gene", subsetName="exampleSubset")
fds <- calculatePadjValues(fds, type="jaccard", 
                 subsets=list("anotherExampleSubset"=c("TIMMDC1")))
padjVals(fds, type="jaccard", subsetName="anotherExampleSubset")

# only adding FDR corrected pvalues on a subset without calculating 
# transcriptome-wide FDR again:
fds <- calculatePadjValuesOnSubset(fds, genesToTest=genesOfInterest, 
         subsetName="setOfInterest", type="jaccard")
padjVals(fds, type="jaccard", subsetName="setOfInterest")


c-mertes/FraseR documentation built on June 15, 2024, 3:29 a.m.