estimate.disp: Fit a parametric disperison model to thinned counts
In NBPSeq: Negative Binomial Models for RNA-Sequencing Data

estimate.disp

R Documentation

Fit a parametric disperison model to thinned counts

Description

Fit a parametric dispersion model to RNA-Seq counts data prepared by prepare.nbp. The model parameters are estimated from the pseudo counts: thinned/down-sampled counts that have the same effective library size.

Usage

estimate.disp(obj, model = "NBQ", print.level = 1, ...)

Arguments

`obj`	output from `prepare.nbp`.
`model`	a string, one of "NBQ" (default), "NBP" or "NB2".
`print.level`	a number, controls the amount of messages printed: 0 for suppressing all messages, 1 for basic progress messages, larger values for more detailed messages.
`...`	additional parameters controlling the estimation of the parameters.

Details

For each individual gene i, a negative binomial (NB) distribution uses a dispersion parameter φ_i to capture the extra-Poisson variation between biological replicates: the NB model imposes a mean-variance relationship σ_i^2 = μ_i + φ_i μ_i^2. In many RNA-Seq data sets, the dispersion parameter φ_i tends to vary with the mean μ_i. We proposed to capture the dispersion-mean dependence using parametric models.

With this function, estimate.disp, users can choose from three parametric models: NB2, NBP and NBQ (default).

Under the NB2 model, the dispersion parameter is a constant and does not vary with the mean expression levels.

Under the NBP model, the log dispersion is modeled as a linear function of preliminarily estimated log mean relative frequencies (pi.pre):

log(phi) = par[1] + par[2] * log(pi.pre/pi.offset),

Under the NBQ model, the log dispersion is modeled as a quadratic function of preliminarily estimated log mean relative frequencies (pi.pre):

log(phi) = par[1] + par[2] * log(pi.pre/pi.offset) + par[3] * (log(pi.pre/pi.offset))^2;

The NBQ model is more flexible than the NBP and NB2 models, and is the current default option.

In the NBP and NBQ models, pi.offset is fixed to be 1e-4, so par[1] corresponds to the dispersion level when the relative mean frequency is 100 reads per million (RPM).

The dispersion parameters are estimated from the pseudo counts (counts adjusted to have the same effective library sizes). The parameters are estimated by maximizing the log conditional likelihood of the model parameters given the row sums. The log conditional likelihood is computed for each gene in each treatment group and then summed over genes and treatment groups.

Value

The list obj from the input with some added components summarizing the fitted dispersion model. Users can print and plot the output to see brief summaries of the fitted dispersion model. The output is otherwise not intended for use by end users directly.

Note

Users should call prepare.nbp before calling this function. The function prepare.nbp will normalize the counts and adjust the counts so that the effective library sizes are approximately the same (computing the conditional likelihood requires the library sizes to be the same).

References

Di Y, Schafer DW, Cumbie JS, and Chang JH (2011): "The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq", Statistical Applications in Genetics and Molecular Biology, 10 (1).