testForDEU: Test for differential usage of exons using the PSI or the...

Description Usage Arguments Details Value Note Author(s)

View source: R/csdex.R

Description

Difference of deviance tests for interactions between each feature and condition for an instance of a gene. Produces a ranked list of significant candidate interactions. The function geneModel invoke by testForDEU and possibly dispatched to multiple cores. The waldTest is invoked by geneModel if model approximation is used.

Usage

1
2
3
4
5
testForDEU(cdx, workers = 1, tmp.dir = NULL, min.cpm = NULL, 
  alpha.wald = NULL, formula = y ~ featureID + condition)
geneModel(input, min.cpm = NULL, tmp.dir = NULL, dist="count", 
  alpha.wald = NULL, formula = y ~ featureID + condition)
waldTest(mm0, model0, alpha=0.05)

Arguments

cdx

A csDEXdataSet object.

workers

Number of cores.

tmp.dir

Temporary directory. If set, results for individual genes are stored immediately as they are computed.

min.cpm

Minimal counts per million reads (CPM) for a gene to be considered. No positions within a gene with a CPM below the set threshold will be tested. An output data frame is still produced, with appropriate message in the msg field.

alpha.wald

Significance threshold for model approximation based on the Wald test for parameter significance. If set, any parameters non-significantly different from zero in the null model are merged to a single "error" parameter. Can significantly reduce computation time for large and sparse datasets.

formula

Formula object describing the null model. For the alternative model, an interaction term for the current '(featureID, condition)' pair is appended.

Details

The core function of the package. Evaluate significance of all possible pairs of interactions between features and conditions by means of difference in deviance model selection. Generalized linear models (GLMs) based on the Negative Binomial distribution are assumed for the quantification based on read counts. The Beta distribution GLM is used to model data based on percentage-spliced (PSI) quantification.

The returned data frame containing interactions will be sorted according to statistical significance (determined by the p-value and padj). The residual value can be used to test the absolute deviation of the null model from the expected value. In the event of an optimization error (numerical problem in matrix inversion or non-convergence) in null or alternative model fitting, the feature-condition pair will be declared non-testable, with corresponding error message stored in the message field.

The read count model requires additional hyperparameters, to be stored in the csDEXDataSet object, which need to be precomputed by package methods estimateSizeFactors, estimateDispersions, estimateGeneCPM, in that order. The PSI model can be run on a constructed csDEXDataSet object directly.

The time complexity of GLM fitting is cubic in the number of parameters, which equal the sum of number of features and conditions. A model approximation based on Wald test of parameter significance can be used by setting the alpha.wald argument to a significance threshold in (0, 1), e.g. 0.05. The internal method waldTest is invoked for model approximation. This leads to faster execution, without affecting the probability of Type I error (i.e. false-positive rate), but affecting the Type II error (i.e. false-negative rate). The returned values nrow, ncol and time can be used to compare the savings with respect to the full model. See references for a detailed description of the approximation.

Value

featureID

Feature identifier.

groupID

Gene identifier.

condition

Condition identifier.

y

The expression value of feature in the given experimental condition. A number in open interval (0, 1) for the PSI model or a non-negative integer for the count model.

cpm

Gene count per million reads (CPM), used by the count model.

pvalue

Statistical significance of the interaction (probability of type 1 error).

padj

Bonferroni-adjusted p-value on a gene-level.

residual

Model residual. The difference between actual value y and the fitted value of the null model.

testable

Whether the interaction complies to used defined threshold and model optimization converged without error.

loglik

Alternative model log-likelihood.

LR

Difference of deviance.

time

Time for alternative model fitting.

nrow

Number of rows in the model design matrix.

ncol

Number of columns in the model design matrix.

msg

Error/warning message, if any.

Note

See the vignette for examples.

Author(s)

Martin Stra<c5><be>ar


mstrazar/csDEX documentation built on May 23, 2019, 8:16 a.m.