squeezeVar: Squeeze Sample Variances
In limma: Linear Models for Microarray Data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Squeeze a set of sample variances together by computing empirical Bayes posterior means.

1	squeezeVar(var, df, covariate=NULL, robust=FALSE, winsor.tail.p=c(0.05,0.1))

`var`	numeric vector of independent sample variances.
`df`	numeric vector of degrees of freedom for the sample variances.
`covariate`	if non-`NULL`, `var.prior` will depend on this numeric covariate. Otherwise, `var.prior` is constant.
`robust`	logical, should the estimation of `df.prior` and `var.prior` be robustified against outlier sample variances?
`winsor.tail.p`	numeric vector of length 1 or 2, giving left and right tail proportions of `x` to Winsorize. Used only when `robust=TRUE`.

This function implements an empirical Bayes algorithm proposed by Smyth (2004).

A conjugate Bayesian hierarchical model is assumed for a set of sample variances. The hyperparameters are estimated by fitting a scaled F-distribution to the sample variances. The function returns the posterior variances and the estimated hyperparameters.

Specifically, the sample variances var are assumed to follow scaled chi-squared distributions, conditional on the true variances, and an scaled inverse chi-squared prior is assumed for the true variances. The scale and degrees of freedom of this prior distribution are estimated from the values of var.

The effect of this function is to squeeze the variances towards a common value, or to a global trend if a covariate is provided. The squeezed variances have a smaller expected mean square error to the true variances than do the sample variances themselves.

If covariate is non-null, then the scale parameter of the prior distribution is assumed to depend on the covariate. If the covariate is average log-expression, then the effect is an intensity-dependent trend similar to that in Sartor et al (2006).

robust=TRUE implements the robust empirical Bayes procedure of Phipson et al (2016) which allows some of the var values to be outliers.

A list with components

`var.post`	numeric vector of posterior variances.
`var.prior`	location of prior distribution. A vector if `covariate` is non-`NULL`, otherwise a scalar.
`df.prior`	degrees of freedom of prior distribution. A vector if `robust=TRUE`, otherwise a scalar.

This function is called by eBayes, but beware a possible confusion with the output from that function. The values var.prior and var.post output by squeezeVar correspond to the quantities s2.prior and s2.post output by eBayes, whereas var.prior output by eBayes relates to a different parameter.

Gordon Smyth

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. http://projecteuclid.org/euclid.aoas/1469199900

Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M (2006). Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC bioinformatics 7, 538.

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf

This function is called by eBayes.

This function calls fitFDist.

An overview of linear model functions in limma is given by 06.LinearModels.

1 2	s2 <- rchisq(20,df=5)/5 squeezeVar(s2, df=5)

$df.prior
[1] 8.178805

$var.prior
[1] 0.7122891

$var.post
 [1] 0.9078940 0.6972887 1.0424227 0.8918925 0.8294069 0.6545469 0.6162900
 [8] 0.4681865 0.7838784 0.9595122 0.4977125 0.7675679 0.5474221 0.7765054
[15] 1.3722614 0.6335368 0.5159625 0.7066399 0.9466935 0.7658626