estimate.dispersion | R Documentation |
Estimate NB dispersion by modeling it as a parametric function of preliminarily estimated log mean relative frequencies.
estimate.dispersion(nb.data, x, model = "NBQ", method = "MAPL", ...)
nb.data |
output from
|
x |
a design matrix specifying the mean structure of each row. |
model |
the name of the dispersion model, one of "NB2", "NBP", "NBQ" (default), "NBS" or "step". |
method |
a character string specifying the method for estimating the dispersion model, one of "ML" or "MAPL" (default). |
... |
(for future use). |
We use a negative binomial (NB) distribution to model the read frequency of gene i in sample j. A negative binomial (NB) distribution uses a dispersion parameter φ_{ij} to model the extra-Poisson variation between biological replicates. Under the NB model, the mean-variance relationship of a single read count satisfies σ_{ij}^2 = μ_{ij} + φ_{ij} μ_{ij}^2. Due to the typically small sample sizes of RNA-Seq experiments, estimating the NB dispersion φ_{ij} for each gene i separately is not reliable. One can pool information across genes and biological samples by modeling φ_{ij} as a function of the mean frequencies and library sizes.
Under the NB2 model, the dispersion is a constant across all genes and samples.
Under the NBP model, the log dispersion is modeled as a
linear function of the preliminary estimates of the log
mean relative frequencies (pi.pre
):
log(phi) = par[1] + par[2] * log(pi.pre/pi.offset),
where pi.offset
is 1e-4.
Under the NBQ model, the dispersion is modeled as a quadratic function of the preliminary estimates of the log mean relative frequencies (pi.pre):
log(phi) = par[1] + par[2] * z + par[3] * z^2,
where z = log(pi.pre/pi.offset). By default, pi.offset is the median of pi.pre[subset,].
Under this NBS model, the dispersion is modeled as a smooth function (a natural cubic spline function) of the preliminary estimates of the log mean relative frequencies (pi.pre).
Under the "step" model, the dispersion is modeled as a step (piecewise constant) function.
a list with following components:
estimates |
dispersion estimates for each read count,
a matrix of the same dimensions as the |
likelihood |
the likelihood of the fitted model. |
model |
details of the estimate dispersion model, NOT intended for use by end users. The name and contents of this component are subject to change in future versions. |
Currently, it is unclear whether a dispersion-modeling approach will outperform a more basic approach where regression model is fitted to each gene separately without considering the dispersion-mean dependence. Clarifying the power-robustness of the dispersion-modeling approach is an ongoing research topic.
## See the example for test.coefficient.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.