# squeezeVar: Squeeze Sample Variances In limma: Linear Models for Microarray Data

## Description

Squeeze a set of sample variances together by computing empirical Bayes posterior means.

## Usage

 `1` ```squeezeVar(var, df, covariate=NULL, robust=FALSE, winsor.tail.p=c(0.05,0.1)) ```

## Arguments

 `var` numeric vector of independent sample variances. `df` numeric vector of degrees of freedom for the sample variances. `covariate` if non-`NULL`, `var.prior` will depend on this numeric covariate. Otherwise, `var.prior` is constant. `robust` logical, should the estimation of `df.prior` and `var.prior` be robustified against outlier sample variances? `winsor.tail.p` numeric vector of length 1 or 2, giving left and right tail proportions of `x` to Winsorize. Used only when `robust=TRUE`.

## Details

This function implements an empirical Bayes algorithm proposed by Smyth (2004).

A conjugate Bayesian hierarchical model is assumed for a set of sample variances. The hyperparameters are estimated by fitting a scaled F-distribution to the sample variances. The function returns the posterior variances and the estimated hyperparameters.

Specifically, the sample variances `var` are assumed to follow scaled chi-squared distributions, conditional on the true variances, and an scaled inverse chi-squared prior is assumed for the true variances. The scale and degrees of freedom of this prior distribution are estimated from the values of `var`.

The effect of this function is to squeeze the variances towards a common value, or to a global trend if a `covariate` is provided. The squeezed variances have a smaller expected mean square error to the true variances than do the sample variances themselves.

If `covariate` is non-null, then the scale parameter of the prior distribution is assumed to depend on the covariate. If the covariate is average log-expression, then the effect is an intensity-dependent trend similar to that in Sartor et al (2006).

`robust=TRUE` implements the robust empirical Bayes procedure of Phipson et al (2016) which allows some of the `var` values to be outliers.

## Value

A list with components

 `var.post` numeric vector of posterior variances. `var.prior` location of prior distribution. A vector if `covariate` is non-`NULL`, otherwise a scalar. `df.prior` degrees of freedom of prior distribution. A vector if `robust=TRUE`, otherwise a scalar.

## Note

This function is called by `eBayes`, but beware a possible confusion with the output from that function. The values `var.prior` and `var.post` output by `squeezeVar` correspond to the quantities `s2.prior` and `s2.post` output by `eBayes`, whereas `var.prior` output by `eBayes` relates to a different parameter.

Gordon Smyth

## References

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. http://projecteuclid.org/euclid.aoas/1469199900

Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M (2006). Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC bioinformatics 7, 538.

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf

This function is called by `eBayes`.

This function calls `fitFDist`.

An overview of linear model functions in limma is given by 06.LinearModels.

## Examples

 ```1 2``` ```s2 <- rchisq(20,df=5)/5 squeezeVar(s2, df=5) ```

### Example output

```\$df.prior
 8.178805

\$var.prior
 0.7122891

\$var.post
 0.9078940 0.6972887 1.0424227 0.8918925 0.8294069 0.6545469 0.6162900
 0.4681865 0.7838784 0.9595122 0.4977125 0.7675679 0.5474221 0.7765054
 1.3722614 0.6335368 0.5159625 0.7066399 0.9466935 0.7658626
```

limma documentation built on Nov. 8, 2020, 8:28 p.m.