Description Usage Arguments Details Value Note Author(s) References See Also Examples

Given a linear model fit from `lmFit`

, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a global value.

1 2 3 |

`fit` |
an |

`proportion` |
numeric value between 0 and 1, assumed proportion of genes which are differentially expressed |

`stdev.coef.lim` |
numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes |

`trend` |
logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant. |

`robust` |
logical, should the estimation of |

`winsor.tail.p` |
numeric vector of length 1 or 2, giving left and right tail proportions of |

`lfc` |
the minimum log2-fold-change that is considered scientifically meaningful |

These functions are used to rank genes in order of evidence for differential expression. They use an empirical Bayes method to squeeze the genewise-wise residual variances towards a common value (or towards a global trend) (Smyth, 2004; Phipson et al, 2016). The degrees of freedom for the individual variances are increased to reflect the extra information gained from the empirical Bayes moderation, resulting in increased statistical power to detect differential expression.

Theese functions accept as input an `MArrayLM`

fitted model object `fit`

produced by `lmFit`

.
The columns of `fit`

define a set of contrasts which are to be tested equal to zero.
The fitted model object may have been processed by `contrasts.fit`

before being passed to `eBayes`

to convert the coefficients of the original design matrix into an arbitrary number of contrasts.

The empirical Bayes moderated t-statistics test each individual contrast equal to zero. For each gene (row), the moderated F-statistic tests whether all the contrasts are zero. The F-statistic is an overall test computed from the set of t-statistics for that probe. This is exactly analogous the relationship between t-tests and F-statistics in conventional anova, except that the residual mean squares have been moderated between genes.

The estimates `s2.prior`

and `df.prior`

are computed by `fitFDist`

.
`s2.post`

is the weighted average of `s2.prior`

and `sigma^2`

with weights proportional to `df.prior`

and `df.residual`

respectively.
The log-odds of differential expression `lods`

was called the *B-statistic* by Loennstedt and Speed (2002).
The F-statistics `F`

are computed by `classifyTestsF`

with `fstat.only=TRUE`

.

`eBayes`

does not compute ordinary t-statistics because they always have worse performance than the moderated versions.
The ordinary (unmoderated) t-statistics can, however, can be easily extracted from the linear model output for comparison purposesâ€”see the example code below.

`treat`

computes empirical Bayes moderated-t p-values relative to a minimum meaningful fold-change threshold.
Instead of testing for genes that have true log-fold-changes different from zero, it tests whether the true log2-fold-change is greater than `lfc`

in absolute value (McCarthy and Smyth, 2009).
In other words, it uses an interval null hypothesis, where the interval is [-lfc,lfc].
When the number of DE genes is large, `treat`

is often useful for giving preference to larger fold-changes and for prioritizing genes that are biologically important.
`treat`

is concerned with p-values rather than posterior odds, so it does not compute the B-statistic `lods`

.
The idea of thresholding doesn't apply to F-statistics in a straightforward way, so moderated F-statistics are also not computed.
When `lfc=0`

, `treat`

is identical to `eBayes`

, except that F-statistics and B-statistics are not computed.
The `lfc`

threshold is usually chosen relatively small, because significantly DE genes must all have fold changes substantially greater than the testing threshold.
Typical values for `lfc`

are `log2(1.1)`

, `log2(1.2)`

or `log2(1.5)`

.
The top genes chosen by `treat`

can be examined using `topTreat`

.

Note that the `lfc`

testing threshold used by `treat`

to the define the null hypothesis is not the same as a log2-fold-change cutoff, as the observed log2-fold-change needs to substantially larger than `lfc`

for the gene to be called as significant.
In practice, modest values for `lfc`

such as `log2(1.1)`

, `log2(1.2)`

or `log2(1.5)`

are usually the most useful.
In practice, setting `lfc=log2(1.2)`

or `lfc=log2(1.5)`

will usually cause most differentially expressed genes to have estimated fold-changes of 2-fold or greater, depending on the sample size and precision of the experiment.

The use of `eBayes`

or `treat`

with `trend=TRUE`

is known as the *limma-trend* method (Law et al, 2014; Phipson et al, 2016).
With this option, an intensity-dependent trend is fitted to the prior variances `s2.prior`

.
Specifically, `squeezeVar`

is called with the `covariate`

equal to `Amean`

, the average log2-intensity for each gene.
The trend that is fitted can be examined by `plotSA`

.
limma-trend is useful for processing expression values that show a mean-variance relationship.
This is often useful for microarray data, and it can also be applied to RNA-seq counts that have been converted to log2-counts per million (logCPM) values (Law et al, 2014).
When applied to RNA-seq logCPM values, limma-trend give similar results to the `voom`

method.
The voom method incorporates the mean-variance trend into the precision weights, whereas limma-trend incorporates the trend into the empirical Bayes moderation.
limma-trend is somewhat simpler than `voom`

because it assumes that the sequencing depths (library sizes) are not wildly different between the samples and it applies the mean-variance trend on a genewise basis instead to individual observations.
limma-trend is recommended for RNA-seq analysis when the library sizes are reasonably consistent (less than 3-fold difference from smallest to largest) because of its simplicity and speed.

If `robust=TRUE`

then the robust empirical Bayes procedure of Phipson et al (2016) is used.
This is frequently useful to protect the empirical Bayes procedure against hyper-variable or hypo-variable genes, especially when analysing RNA-seq data.
See `squeezeVar`

for more details.

`eBayes`

produces an object of class `MArrayLM`

(see `MArrayLM-class`

) containing everything found in `fit`

plus the following added components:

`t` |
numeric matrix of moderated t-statistics. |

`p.value` |
numeric matrix of two-sided p-values corresponding to the t-statistics. |

`lods` |
numeric matrix giving the log-odds of differential expression (on the natural log scale). |

`s2.prior` |
estimated prior value for |

`df.prior` |
degrees of freedom associated with |

`df.total` |
row-wise numeric vector giving the total degrees of freedom associated with the t-statistics for each gene. Equal to |

`s2.post` |
row-wise numeric vector giving the posterior values for |

`var.prior` |
column-wise numeric vector giving estimated prior values for the variance of the log2-fold-changes for differentially expressed gene for each constrast. Used for evaluating |

`F` |
row-wise numeric vector of moderated F-statistics for testing all contrasts defined by the columns of |

`F.p.value` |
row-wise numeric vector giving p-values corresponding to |

The matrices `t`

, `p.value`

and `lods`

have the same dimensions as the input object `fit`

, with rows corresponding to genes and columns to coefficients or contrasts.
The vectors `s2.prior`

, `df.prior`

, `df.total`

, `F`

and `F.p.value`

correspond to rows, with length equal to the number of genes.
The vector `var.prior`

corresponds to columns, with length equal to the number of contrasts.
If `s2.prior`

or `df.prior`

have length 1, then the same value applies to all genes.

`s2.prior`

, `df.prior`

and `var.prior`

contain empirical Bayes hyperparameters used to obtain `df.total`

, `s2.post`

and `lods`

.

`treat`

a produces an `MArrayLM`

object similar to that from `eBayes`

but without `lods`

, `var.prior`

, `F`

or `F.p.value`

.

The algorithm used by `eBayes`

and `treat`

with `robust=TRUE`

was revised slightly in limma 3.27.6.
The minimum `df.prior`

returned may be slightly smaller than previously.

Gordon Smyth and Davis McCarthy

Law, CW, Chen, Y, Shi, W, Smyth, GK (2014).
Voom: precision weights unlock linear model analysis tools for RNA-seq read counts.
*Genome Biology* 15, R29.
http://genomebiology.com/2014/15/2/R29

Loennstedt, I., and Speed, T. P. (2002). Replicated microarray data. *Statistica Sinica* **12**, 31-46.

McCarthy, D. J., and Smyth, G. K. (2009).
Testing significance relative to a fold-change threshold is a TREAT.
*Bioinformatics* 25, 765-771.
http://bioinformatics.oxfordjournals.org/content/25/6/765

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016).
Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression.
*Annals of Applied Statistics* 10, 946-963.
http://projecteuclid.org/euclid.aoas/1469199900

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.
*Statistical Applications in Genetics and Molecular Biology* 3, Article 3.
http://www.statsci.org/smyth/pubs/ebayes.pdf

`squeezeVar`

, `fitFDist`

, `tmixture.matrix`

, `plotSA`

.

An overview of linear model functions in limma is given by 06.LinearModels.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ```
# See also lmFit examples
# Simulate gene expression data,
# 6 microarrays and 100 genes with one gene differentially expressed
set.seed(2016)
sigma2 <- 0.05 / rchisq(100, df=10) * 10
y <- matrix(rnorm(100*6,sd=sqrt(sigma2)),100,6)
design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1))
y[1,4:6] <- y[1,4:6] + 1
fit <- lmFit(y,design)
# Moderated t-statistic
fit <- eBayes(fit)
topTable(fit,coef=2)
# Ordinary t-statistic
ordinary.t <- fit$coef[,2] / fit$stdev.unscaled[,2] / fit$sigma
# Treat relative to a 10% fold-change
tfit <- treat(fit,lfc=log2(1.1))
topTreat(tfit,coef=2)
``` |

```
logFC AveExpr t P.Value adj.P.Val B
1 1.0261201 0.613239029 5.133348 1.067577e-05 0.001067577 3.255898
65 -0.4142890 0.132351578 -2.023881 5.065654e-02 0.984178668 -4.656835
86 -0.3849375 -0.062785795 -1.993078 5.408190e-02 0.984178668 -4.712776
6 -0.3726952 -0.030427277 -1.854613 7.207864e-02 0.984178668 -4.955723
48 -0.3678583 0.124175287 -1.797670 8.084747e-02 0.984178668 -5.051492
72 0.3688488 -0.023410981 1.727803 9.282497e-02 0.984178668 -5.165605
31 -0.3376838 -0.142492004 -1.665522 1.047219e-01 0.984178668 -5.264117
3 0.3432493 0.006803296 1.651775 1.075114e-01 0.984178668 -5.285447
90 -0.3070586 -0.256501945 -1.566593 1.261933e-01 0.984178668 -5.414233
41 -0.3205739 -0.082905186 -1.563103 1.270119e-01 0.984178668 -5.419385
logFC AveExpr t P.Value adj.P.Val
1 1.0261201 0.613239029 4.4454621 4.281708e-05 0.004281708
65 -0.4142890 0.132351578 -1.3521499 9.785258e-02 0.995284171
86 -0.3849375 -0.062785795 -1.2811304 1.095148e-01 0.995284171
6 -0.3726952 -0.030427277 -1.1703653 1.327319e-01 0.995284171
48 -0.3678583 0.124175287 -1.1257102 1.432378e-01 0.995284171
72 0.3688488 -0.023410981 1.0836934 1.546077e-01 0.995284171
31 -0.3376838 -0.142492004 -0.9873279 1.775689e-01 0.995284171
3 0.3432493 0.006803296 0.9900846 1.778003e-01 0.995284171
41 -0.3205739 -0.082905186 -0.8926422 2.050634e-01 0.995284171
90 -0.3070586 -0.256501945 -0.8650592 2.112381e-01 0.995284171
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.