DESeq: Differential expression analysis based on the Negative...
In aghozlane/DESeq2shaman: Differential gene expression analysis based on the negative binomial distribution

Description Usage Arguments Details Value Author(s) References See Also Examples

This function performs a default analysis through the steps:

estimation of size factors: estimateSizeFactors
estimation of dispersion: estimateDispersions
Negative Binomial GLM fitting and Wald statistics: nbinomWaldTest

For complete details on each step, see the manual pages of the respective functions. After the DESeq function returns a DESeqDataSet object, results tables (log2 fold changes and p-values) can be generated using the results function. See the manual page for results for information on independent filtering and p-value adjustment for multiple test correction.

DESeq(object, test = c("Wald", "LRT"), fitType = c("parametric", "local",
  "mean"), betaPrior, full = design(object), reduced, quiet = FALSE,
  minReplicatesForReplace = 7, modelMatrixType, parallel = FALSE,
  BPPARAM = bpparam())

`object`	a DESeqDataSet object, see the constructor functions `DESeqDataSet`, `DESeqDataSetFromMatrix`, `DESeqDataSetFromHTSeqCount`.
`test`	either "Wald" or "LRT", which will then use either Wald significance tests (defined by `nbinomWaldTest`), or the likelihood ratio test on the difference in deviance between a full and reduced model formula (defined by `nbinomLRT`)
`fitType`	either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity. See `estimateDispersions` for description.
`betaPrior`	whether or not to put a zero-mean normal prior on the non-intercept coefficients See `nbinomWaldTest` for description of the calculation of the beta prior. By default, the beta prior is used only for the Wald test, but can also be specified for the likelihood ratio test.
`full`	the full model formula, this should be the formula in `design(object)`, only used by the likelihood ratio test
`reduced`	a reduced formula to compare against, e.g. the full model with a term or terms of interest removed, only used by the likelihood ratio test
`quiet`	whether to print messages at each step
`minReplicatesForReplace`	the minimum number of replicates required in order to use `replaceOutliers` on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set to `Inf` in order to never replace outliers.
`modelMatrixType`	either "standard" or "expanded", which describe how the model matrix, X of the GLM formula is formed. "standard" is as created by `model.matrix` using the design formula. "expanded" includes an indicator variable for each level of factors in addition to an intercept. for more information see the Description of `nbinomWaldTest`. betaPrior must be set to TRUE in order for expanded model matrices to be fit.
`parallel`	if FALSE, no parallelization. if TRUE, parallel execution using `BiocParallel`, see next argument `BPPARAM`
`BPPARAM`	an optional parameter object passed internally to `bplapply` when `parallel=TRUE`. If not specified, the parameters last registered with `register` will be used.

The differential expression analysis uses a generalized linear model of the form:

K_ij ~ NB(mu_ij, alpha_i)

mu_ij = s_j q_ij

log2(q_ij) = x_j. beta_i

where counts K_ij for gene i, sample j are modeled using a Negative Binomial distribution with fitted mean mu_ij and a gene-specific dispersion parameter alpha_i. The fitted mean is composed of a sample-specific size factor s_j and a parameter q_ij proportional to the expected true concentration of fragments for sample j. The coefficients beta_i give the log2 fold changes for gene i for each column of the model matrix X. The sample-specific size factors can be replaced by gene-specific normalization factors for each sample using normalizationFactors. For details on the fitting of the log2 fold changes and calculation of p-values see nbinomWaldTest (or nbinomLRT if using test="LRT").

Experiments without replicates do not allow for estimation of the dispersion of counts around the expected value for each group, which is critical for differential expression analysis. If an experimental design is supplied which does not contain the necessary degrees of freedom for differential analysis, DESeq will provide a message to the user and follow the strategy outlined in Anders and Huber (2010) under the section 'Working without replicates', wherein all the samples are considered as replicates of a single group for the estimation of dispersion. As noted in the reference above: "Some overestimation of the variance may be expected, which will make that approach conservative." Furthermore, "while one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation."

The argument minReplicatesForReplace is used to decide which samples are eligible for automatic replacement in the case of extreme Cook's distance. By default, DESeq will replace outliers if the Cook's distance is large for a sample which has 7 or more replicates (including itself). This replacement is performed by the replaceOutliers function. This default behavior helps to prevent filtering genes based on Cook's distance when there are many degrees of freedom. See results for more information about filtering using Cook's distance, and the 'Dealing with outliers' section of the vignette. Unlike the behavior of replaceOutliers, here original counts are kept in the matrix returned by counts, original Cook's distances are kept in assays(dds)[["cooks"]], and the replacement counts used for fitting are kept in assays(object)[["replaceCounts"]].

Note that if a log2 fold change prior is used (betaPrior=TRUE) then expanded model matrices will be used in fitting. These are described in nbinomWaldTest and in the vignette. The contrast argument of results should be used for generating results tables.

a DESeqDataSet object with results stored as metadata columns. These results should accessed by calling the results function. By default this will return the log2 fold changes and p-values for the last variable in the design formula. See results for how to access results for other variables.

Michael Love

DESeq2 reference:

Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 2014, 15:550. http://dx.doi.org/10.1186/s13059-014-0550-8

DESeq reference:

Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb-2010-11-10-r106

nbinomWaldTest, nbinomLRT

dds <- makeExampleDESeqDataSet(betaSD=1, n=100)
dds <- DESeq(dds)
res <- results(dds)
ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1)
resLRT <- results(ddsLRT)