06linearmodels: Topic: Linear Models for Microarrays

Description Fitting Models Forming the Design Matrix Making Comparisons of Interest Assessing Differential Expression Summarizing Model Fits Model Selection Author(s) References See Also

Description

This page gives an overview of the LIMMA functions available to fit linear models and to interpret the results. This page covers models for two color arrays in terms of log-ratios or for single-channel arrays in terms of log-intensities. If you wish to fit models to the individual channel log-intensities from two colour arrays, see 07.SingleChannel.

The core of this package is the fitting of gene-wise linear models to microarray data. The basic idea is to estimate log-ratios between two or more target RNA samples simultaneously. See the LIMMA User's Guide for several case studies.

Fitting Models

The main function for model fitting is lmFit. This is recommended interface for most users. lmFit produces a fitted model object of class MArrayLM containing coefficients, standard errors and residual standard errors for each gene. lmFit calls one of the following three functions to do the actual computations:

lm.series

Straightforward least squares fitting of a linear model for each gene.

mrlm

An alternative to lm.series using robust regression as implemented by the rlm function in the MASS package.

gls.series

Generalized least squares taking into account correlations between duplicate spots (i.e., replicate spots on the same array) or related arrays. The function duplicateCorrelation is used to estimate the inter-duplicate or inter-block correlation before using gls.series.

All the functions which fit linear models use link{getEAW} to extract data from microarray data objects, and unwrapdups which provides an unified method for handling duplicate spots.

Forming the Design Matrix

lmFit has two main arguments, the expression data and the design matrix. The design matrix is essentially an indicator matrix which specifies which target RNA samples were applied to each channel on each array. There is considerable freedom in choosing the design matrix - there is always more than one choice which is correct provided it is interpreted correctly.

Design matrices for Affymetrix or single-color arrays can be created using the function model.matrix which is part of the R base package. The function modelMatrix is provided to assist with creation of an appropriate design matrix for two-color microarray experiments. For direct two-color designs, without a common reference, the design matrix often needs to be created by hand.

Making Comparisons of Interest

Once a linear model has been fit using an appropriate design matrix, the command makeContrasts may be used to form a contrast matrix to make comparisons of interest. The fit and the contrast matrix are used by contrasts.fit to compute fold changes and t-statistics for the contrasts of interest. This is a way to compute all possible pairwise comparisons between treatments for example in an experiment which compares many treatments to a common reference.

Assessing Differential Expression

After fitting a linear model, the standard errors are moderated using a simple empirical Bayes model using eBayes or treat. A moderated t-statistic and a log-odds of differential expression is computed for each contrast for each gene. treat tests whether log-fold-changes are greater than a threshold rather than merely different to zero.

eBayes and treat use internal functions squeezeVar, fitFDist, tmixture.matrix and tmixture.vector.

Summarizing Model Fits

After the above steps the results may be displayed or further processed using:

topTable

Presents a list of the genes most likely to be differentially expressed for a given contrast or set of contrasts.

topTableF

Presents a list of the genes most likely to be differentially expressed for a given set of contrasts. Equivalent to topTable with coef set to all the coefficients, coef=1:ncol(fit).

volcanoplot

Volcano plot of fold change versus the B-statistic for any fitted coefficient.

plotlines

Plots fitted coefficients or log-intensity values for time-course data.

genas

Estimates and plots biological correlation between two coefficients.

write.fit

Writes an MarrayLM object to a file. Note that if fit is an MArrayLM object, either write.fit or write.table can be used to write the results to a delimited text file.

For multiple testing functions which operate on linear model fits, see 08.Tests.

Model Selection

selectModel provides a means to choose between alternative linear models using AIC or BIC information criteria.

Author(s)

Gordon Smyth

References

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. http://projecteuclid.org/euclid.aoas/1469199900

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, No. 1, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.

See Also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq


hdeberg/limma documentation built on Dec. 20, 2021, 3:43 p.m.