This function estimates the size factors using the
"median ratio method" described by Equation 5 in Anders and Huber (2010).
The estimated size factors can be accessed using the accessor function sizeFactors
.
Alternative library size estimators can also be supplied
using the assignment function sizeFactors<
.
object 
a DESeqDataSet 
type 
Method for estimation: either "ratio", "poscounts", or "iterate". "ratio" uses the standard median ratio method introduced in DESeq. The size factor is the median ratio of the sample over a "pseudosample": for each gene, the geometric mean of all samples. "poscounts" and "iterate" offer alternative estimators, which can be used even when all genes contain a sample with a zero (a problem for the default method, as the geometric mean becomes zero, and the ratio undefined). The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the nth root of the product of the nonzero counts. This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples. The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and finding a size factor vector by numerically optimizing the likelihood of the ~1 model. 
locfunc 
a function to compute a location for a sample. By default, the
median is used. However, especially for low counts, the

geoMeans 
by default this is not provided and the
geometric means of the counts are calculated within the function.
A vector of geometric means from another count matrix can be provided
for a "frozen" size factor calculation. The size factors will be
scaled to have a geometric mean of 1 when supplying 
controlGenes 
optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spikein genes) 
normMatrix 
optional, a matrix of normalization factors which do not yet
control for library size. Note that this argument should not be used (and
will be ignored) if the 
quiet 
whether to print messages 
Typically, the function is called with the idiom:
dds < estimateSizeFactors(dds)
See DESeq
for a description of the use of size factors in the GLM.
One should call this function after DESeqDataSet
unless size factors are manually specified with sizeFactors
.
Alternatively, genespecific normalization factors for each sample can be provided using
normalizationFactors
which will always preempt sizeFactors
in calculations.
Internally, the function calls estimateSizeFactorsForMatrix
,
which provides more details on the calculation.
The DESeqDataSet passed as parameters, with the size factors filled in.
Simon Anders
Reference for the median ratio method:
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb20101110r106
1 2 3 4 5 6 7 8 9 10 11 12 13  dds < makeExampleDESeqDataSet(n=1000, m=4)
dds < estimateSizeFactors(dds)
sizeFactors(dds)
dds < estimateSizeFactors(dds, controlGenes=1:200)
m < matrix(runif(1000 * 4, .5, 1.5), ncol=4)
dds < estimateSizeFactors(dds, normMatrix=m)
normalizationFactors(dds)[1:3,]
geoMeans < exp(rowMeans(log(counts(dds))))
dds < estimateSizeFactors(dds,geoMeans=geoMeans)
sizeFactors(dds)

