View source: R/Fraser-pipeline.R
FRASER | R Documentation |
This help page describes the FRASER function which can be used run the default FRASER pipeline. This pipeline combines the beta-binomial fit, the computation of Z scores and p values as well as the computation of delta-PSI values.
FRASER(
fds,
q,
type = fitMetrics(fds),
implementation = c("PCA", "PCA-BB-Decoder", "AE-weighted", "AE", "BB"),
iterations = 15,
BPPARAM = bpparam(),
correction,
subsets = NULL,
...
)
calculateZscore(fds, type = currentType(fds), logit = TRUE)
calculatePvalues(
fds,
type = currentType(fds),
implementation = "PCA",
BPPARAM = bpparam(),
distributions = c("betabinomial"),
capN = 5 * 1e+05
)
calculatePadjValues(
fds,
type = currentType(fds),
method = "BY",
rhoCutoff = NA,
geneLevel = TRUE,
geneColumn = "hgnc_symbol",
subsets = NULL,
BPPARAM = bpparam()
)
calculatePadjValuesOnSubset(
fds,
genesToTest,
subsetName,
type = currentType(fds),
method = "BY",
geneColumn = "hgnc_symbol",
BPPARAM = bpparam()
)
fds |
A |
q |
The encoding dimensions to be used during the fitting proceadure.
Should be fitted using |
type |
The type of PSI (jaccard, psi5, psi3 or theta for theta/splicing efficiency) |
implementation |
The method that should be used to correct for confounders. |
iterations |
The maximal number of iterations. When the autoencoder has not yet converged after these number of iterations, the fit stops anyway. |
BPPARAM |
A BiocParallel object to run the computation in parallel |
correction |
Deprecated. The name changed to implementation. |
subsets |
A named list of named lists specifying any number of gene subsets (can differ per sample). For each subset, FDR correction will be limited to genes in the subset, and the FDR corrected pvalues stored as an assay in the fds object in addition to the transcriptome-wide FDR corrected pvalues. See the examples for how to use this argument. |
... |
Additional parameters passed on to the internal fit function |
logit |
Indicates if z scores are computed on the logit scale (default) or in the natural (psi) scale. |
distributions |
The distribution based on which the p-values are calculated. Possible are beta-binomial, binomial and normal. |
capN |
Counts are capped at this value to speed up the p-value calculation |
method |
The p.adjust method that should be used for genome-wide multiple testing correction. |
rhoCutoff |
The cutoff value on the fitted rho value (overdispersion parameter of the betabinomial) above which junctions are masked with NA during p value adjustment (default: NA, no masking). |
geneLevel |
Logical value indiciating whether gene-level p values should be calculated. Defaults to TRUE. |
geneColumn |
The column name of the column that has the gene annotation that will be used for gene-level pvalue computation. |
genesToTest |
A named list with the subset of genes to test per sample. The names must correspond to the sampleIDs in the given fds object. |
subsetName |
The name under which the resulting FDR corrected pvalues will be stored in metadata(fds). |
All computed values are returned as an FraserDataSet object. To have more control over each analysis step, one can call each function separately.
fit
to control for confounding effects and fit the beta
binomial model parameters
calculatePvalues
to calculate the nominal p values
calculatePadjValues
to calculate adjusted p values (per
sample)
calculateZscore
to calculate the Z scores
Available methods to correct for the confounders are currently: a denoising autoencoder with a BB loss ("AE" and "AE-weighted"), PCA ("PCA"), a hybrid approach where PCA is used to fit the latent space and then the decoder of the autoencoder is fit using the BB loss ("PCA-BB-Decoder"). Although not recommended, it is also possible to directly fit the BB distrbution to the raw counts ("BB").
FraserDataSet
FRASER()
: This function runs the default FRASER pipeline combining
the beta-binomial fit, the computation of Z scores and p values as well as
the computation of delta-PSI values.
calculateZscore()
: This function calculates z-scores based on the
observed and expected logit
psi.
calculatePvalues()
: This function calculates two-sided p-values based on
the beta-binomial distribution (or binomial or normal if desired). The
returned p values are not yet adjusted with Holm's method per donor or
acceptor site, respectively.
calculatePadjValues()
: This function adjusts the previously calculated
p-values per sample for multiple testing. First, the previoulsy calculated
junction-level p values are adjusted with Holm's method per donor or
acceptor site, respectively. Then, if gene symbols have been annotated to
junctions (and not otherwise requested), gene-level p values are computed.
calculatePadjValuesOnSubset()
: This function does FDR correction only for all junctions
in a certain subset of genes which can differ per sample. Requires gene
symbols to have been annotated to junctions. As with the full FDR
correction across all junctions, first the previously calculated
junction-level p values are adjusted with Holm's method per donor or
acceptor site, respectively. Then, gene-level p values are computed.
Christian Mertes mertes@in.tum.de
Ines Scheller scheller@in.tum.de
fit
# set default parallel backend
register(SerialParam())
# preprocessing
fds <- createTestFraserDataSet()
# filtering not expressed introns
fds <- calculatePSIValues(fds)
fds <- filterExpressionAndVariability(fds)
# Run the full analysis pipeline: fits distribution and computes p values
fds <- FRASER(fds, q=2, implementation="PCA")
# afterwards, the fitted fds-object can be saved and results can
# be extracted and visualized, see ?saveFraserDataSet, ?results and
# ?plotVolcano
# The functions run inside the FRASER function can also be directly
# run themselves.
# To directly run the fit function:
fds <- fit(fds, implementation="PCA", q=2, type="jaccard")
# To directly run the nomial and adjusted p value and z score
# calculation, the following functions can be used:
fds <- calculatePvalues(fds, type="jaccard")
head(pVals(fds, type="jaccard"))
fds <- calculatePadjValues(fds, type="jaccard", method="BY")
head(padjVals(fds, type="jaccard"))
fds <- calculateZscore(fds, type="jaccard")
head(zScores(fds, type="jaccard"))
# example of restricting FDR correction to subsets of genes of interest
genesOfInterest <- list("sample1"=c("TIMMDC1"), "sample2"=c("MCOLN1"))
fds <- calculatePadjValues(fds, type="jaccard",
subsets=list("exampleSubset"=genesOfInterest))
padjVals(fds, type="jaccard", subsetName="exampleSubset")
padjVals(fds, type="jaccard", level="gene", subsetName="exampleSubset")
fds <- calculatePadjValues(fds, type="jaccard",
subsets=list("anotherExampleSubset"=c("TIMMDC1")))
padjVals(fds, type="jaccard", subsetName="anotherExampleSubset")
# only adding FDR corrected pvalues on a subset without calculating
# transcriptome-wide FDR again:
fds <- calculatePadjValuesOnSubset(fds, genesToTest=genesOfInterest,
subsetName="setOfInterest", type="jaccard")
padjVals(fds, type="jaccard", subsetName="setOfInterest")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.