This function performs a default analysis through the steps:
estimation of size factors:
estimation of dispersion:
Negative Binomial GLM fitting and Wald statistics:
For complete details on each step, see the manual pages of the respective
functions. After the
DESeq function returns a DESeqDataSet object,
results tables (log2 fold changes and p-values) can be generated
results function. See the manual page
results for information on independent filtering and
p-value adjustment for multiple test correction.
1 2 3 4
a DESeqDataSet object, see the constructor functions
either "Wald" or "LRT", which will then use either
Wald significance tests (defined by
either "parametric", "local", or "mean"
for the type of fitting of dispersions to the mean intensity.
whether or not to put a zero-mean normal prior on
the non-intercept coefficients
the full model formula, this should be the formula in
a reduced formula to compare against, e.g. the full model with a term or terms of interest removed, only used by the likelihood ratio test
whether to print messages at each step
the minimum number of replicates required
in order to use
either "standard" or "expanded", which describe
how the model matrix, X of the GLM formula is formed.
"standard" is as created by
if FALSE, no parallelization. if TRUE, parallel
an optional parameter object passed internally
The differential expression analysis uses a generalized linear model of the form:
K_ij ~ NB(mu_ij, alpha_i)
mu_ij = s_j q_ij
log2(q_ij) = x_j. beta_i
where counts K_ij for gene i, sample j are modeled using
a Negative Binomial distribution with fitted mean mu_ij
and a gene-specific dispersion parameter alpha_i.
The fitted mean is composed of a sample-specific size factor
s_j and a parameter q_ij proportional to the
expected true concentration of fragments for sample j.
The coefficients beta_i give the log2 fold changes for gene i for each
column of the model matrix X.
The sample-specific size factors can be replaced by
gene-specific normalization factors for each sample using
normalizationFactors. For details on the fitting of the log2
fold changes and calculation of p-values see
nbinomLRT if using
Experiments without replicates do not allow for estimation of the dispersion
of counts around the expected value for each group, which is critical for
differential expression analysis. If an experimental design is
supplied which does not contain the necessary degrees of freedom for differential
DESeq will provide a message to the user and follow
the strategy outlined in Anders and Huber (2010)
under the section 'Working without replicates', wherein all the samples
are considered as replicates of a single group for the estimation of dispersion.
As noted in the reference above: "Some overestimation of the variance
may be expected, which will make that approach conservative."
Furthermore, "while one may not want to draw strong conclusions from such an analysis,
it may still be useful for exploration and hypothesis generation."
minReplicatesForReplace is used to decide which samples
are eligible for automatic replacement in the case of extreme Cook's distance.
DESeq will replace outliers if the Cook's distance is
large for a sample which has 7 or more replicates (including itself).
This replacement is performed by the
function. This default behavior helps to prevent filtering genes
based on Cook's distance when there are many degrees of freedom.
results for more information about filtering using
Cook's distance, and the 'Dealing with outliers' section of the vignette.
Unlike the behavior of
replaceOutliers, here original counts are
kept in the matrix returned by
counts, original Cook's
distances are kept in
assays(dds)[["cooks"]], and the replacement
counts used for fitting are kept in
Note that if a log2 fold change prior is used (betaPrior=TRUE)
then expanded model matrices will be used in fitting. These are
nbinomWaldTest and in the vignette. The
contrast argument of
results should be used for
generating results tables.
DESeqDataSet object with results stored as
metadata columns. These results should accessed by calling the
function. By default this will return the log2 fold changes and p-values for the last
variable in the design formula. See
results for how to access results
for other variables.
Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 2014, 15:550. http://dx.doi.org/10.1186/s13059-014-0550-8
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb-2010-11-10-r106
1 2 3 4 5
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.