Description Usage Arguments Details Value Author(s) See Also Examples

Fit linear model for each gene given a series of arrays

1 2 |

`object` |
A matrix-like data object containing log-ratios or log-expression values for a series of arrays, with rows corresponding to genes and columns to samples.
Any type of data object that can be processed by |

`design` |
the design matrix of the microarray experiment, with rows corresponding to arrays and columns to coefficients to be estimated. Defaults to the unit vector meaning that the arrays are treated as replicates. |

`ndups` |
positive integer giving the number of times each distinct probe is printed on each array. |

`spacing` |
positive integer giving the spacing between duplicate occurrences of the same probe, |

`block` |
vector or factor specifying a blocking variable on the arrays. Has length equal to the number of arrays. Must be |

`correlation` |
the inter-duplicate or inter-technical replicate correlation |

`weights` |
non-negative precision weights. Can be a numeric matrix of individual weights of same size as the object expression matrix, or a numeric vector of array weights with length equal to |

`method` |
fitting method; |

`...` |
other optional arguments to be passed to |

This function fits multiple linear models by weighted or generalized least squares.
It accepts data from a experiment involving a series of microarrays with the same set of probes.
A linear model is fitted to the expression data for each probe.
The expression data should be log-ratios for two-color array platforms or log-expression values for one-channel platforms.
(To fit linear models to the individual channels of two-color array data, see `lmscFit`

.)
The coefficients of the fitted models describe the differences between the RNA sources hybridized to the arrays.
The probe-wise fitted model results are stored in a compact form suitable for further processing by other functions in the limma package.

The function allows for missing values and accepts quantitative precision weights through the `weights`

argument.
It also supports two different correlation structures.
If `block`

is not `NULL`

then different arrays are assumed to be correlated.
If `block`

is `NULL`

and `ndups`

is greater than one then replicate spots on the same array are assumed to be correlated.
It is not possible at this time to fit models with both a block structure and a duplicate-spot correlation structure simultaneously.

If `object`

is a matrix then it should contain log-ratios or log-expression data with rows corresponding to probes and columns to arrays.
(A numeric vector is treated the same as a matrix with one column.)
For objects of other classes, a matrix of expression values is taken from the appropriate component or slot of the object.
If `object`

is of class `MAList`

or `marrayNorm`

, then the matrix of log-ratios (M-values) is extracted.
If `object`

is of class `ExpressionSet`

, then the expression matrix is extracted.
(This may contain log-expression or log-ratio values, depending on the platform.)
If `object`

is of class `PLMset`

then the matrix of chip coefficients `chip.coefs`

is extracted.

The arguments `design`

, `ndups`

, `spacing`

and `weights`

will be extracted from the data `object`

if available and do not normally need to set explicitly in the call.
On the other hand, if any of these are set in the function call then they will over-ride the slots or components in the data `object`

.
If `object`

is an `PLMset`

, then weights are computed as `1/pmax(object@se.chip.coefs, 1e-05)^2`

.
If `object`

is an `ExpressionSet`

object, then weights are not computed.

If the argument `block`

is used, then it is assumed that `ndups=1`

.

The `correlation`

argument has a default value of `0.75`

, but in normal use this default value should not be relied on and the correlation value should be estimated using the function `duplicateCorrelation`

.
The default value is likely to be too high in particular if used with the `block`

argument.

The actual linear model computations are done by passing the data to one the lower-level functions `lm.series`

, `gls.series`

or `mrlm`

.
The function `mrlm`

is used if `method="robust"`

.
If `method="ls"`

, then `gls.series`

is used if a correlation structure has been specified, i.e., if `ndups>1`

or `block`

is non-null and `correlation`

is different from zero.
If `method="ls"`

and there is no correlation structure, `lm.series`

is used.

An `MArrayLM`

object containing the result of the fits.

The rownames of `object`

are preserved in the fit object and can be retrieved by `rownames(fit)`

where `fit`

is output from `lmFit`

.
The column names of `design`

are preserved as column names and can be retrieved by `colnames(fit)`

.

Gordon Smyth

`lmFit`

uses `getEAWP`

to extract expression values, gene annotation and so from the data `object`

.

An overview of linear model functions in limma is given by 06.LinearModels.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ```
# Simulate gene expression data for 100 probes and 6 microarrays
# Microarray are in two groups
# First two probes are differentially expressed in second group
# Std deviations vary between genes with prior df=4
sd <- 0.3*sqrt(4/rchisq(100,df=4))
y <- matrix(rnorm(100*6,sd=sd),100,6)
rownames(y) <- paste("Gene",1:100)
y[1:2,4:6] <- y[1:2,4:6] + 2
design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))
options(digits=3)
# Ordinary fit
fit <- lmFit(y,design)
fit <- eBayes(fit)
topTable(fit,coef=2)
dim(fit)
colnames(fit)
rownames(fit)[1:10]
names(fit)
# Fold-change thresholding
fit2 <- treat(fit,lfc=0.1)
topTreat(fit2,coef=2)
# Volcano plot
volcanoplot(fit,coef=2,highlight=2)
# Mean-difference plot
plotMD(fit,column=2)
# Q-Q plot of moderated t-statistics
qqt(fit$t[,2],df=fit$df.residual+fit$df.prior)
abline(0,1)
# Various ways of writing results to file
## Not run: write.fit(fit,file="exampleresults.txt")
## Not run: write.table(fit,file="exampleresults2.txt")
# Fit with correlated arrays
# Suppose each pair of arrays is a block
block <- c(1,1,2,2,3,3)
dupcor <- duplicateCorrelation(y,design,block=block)
dupcor$consensus.correlation
fit3 <- lmFit(y,design,block=block,correlation=dupcor$consensus)
# Fit with duplicate probes
# Suppose two side-by-side duplicates of each gene
rownames(y) <- paste("Gene",rep(1:50,each=2))
dupcor <- duplicateCorrelation(y,design,ndups=2)
dupcor$consensus.correlation
fit4 <- lmFit(y,design,ndups=2,correlation=dupcor$consensus)
dim(fit4)
fit4 <- eBayes(fit4)
topTable(fit4,coef=2)
``` |

```
logFC AveExpr t P.Value adj.P.Val B
Gene 1 1.883 1.17223 9.21 1.49e-05 0.00149 3.64
Gene 2 2.480 1.03879 6.65 1.56e-04 0.00780 1.16
Gene 88 -0.923 0.04903 -3.37 9.67e-03 0.26983 -3.21
Gene 6 -0.824 0.07105 -3.30 1.08e-02 0.26983 -3.33
Gene 94 0.597 -0.27682 3.13 1.38e-02 0.27568 -3.58
Gene 49 0.570 0.05615 2.80 2.29e-02 0.38231 -4.10
Gene 72 -0.617 0.03080 -2.68 2.78e-02 0.39370 -4.30
Gene 29 -0.531 0.00353 -2.60 3.15e-02 0.39370 -4.43
Gene 95 -0.657 -0.10526 -2.46 3.94e-02 0.42703 -4.65
Gene 19 0.749 0.14357 2.40 4.27e-02 0.42703 -4.73
[1] 100 2
[1] "Grp1" "Grp2vs1"
[1] "Gene 1" "Gene 2" "Gene 3" "Gene 4" "Gene 5" "Gene 6" "Gene 7"
[8] "Gene 8" "Gene 9" "Gene 10"
[1] "coefficients" "rank" "assign" "qr"
[5] "df.residual" "sigma" "cov.coefficients" "stdev.unscaled"
[9] "pivot" "Amean" "method" "design"
[13] "df.prior" "s2.prior" "var.prior" "proportion"
[17] "s2.post" "t" "df.total" "p.value"
[21] "lods" "F" "F.p.value"
logFC AveExpr t P.Value adj.P.Val
Gene 1 1.883 1.17223 8.72 1.62e-05 0.00162
Gene 2 2.480 1.03879 6.38 1.63e-04 0.00814
Gene 88 -0.923 0.04903 -3.01 1.12e-02 0.32274
Gene 6 -0.824 0.07105 -2.90 1.29e-02 0.32274
Gene 94 0.597 -0.27682 2.61 1.86e-02 0.37255
Gene 49 0.570 0.05615 2.31 3.01e-02 0.48207
Gene 72 -0.617 0.03080 -2.25 3.45e-02 0.48207
Gene 29 -0.531 0.00353 -2.11 4.12e-02 0.48207
Gene 95 -0.657 -0.10526 -2.08 4.63e-02 0.48207
Gene 19 0.749 0.14357 2.08 4.82e-02 0.48207
Warning message:
In glmgam.fit(dx, dy, coef.start = start, tol = tol, maxit = maxit, :
Too much damping - convergence tolerance not achievable
[1] -0.164
[1] 0.00861
[1] 50 2
logFC AveExpr t P.Value adj.P.Val B
1 2.181 1.105506 9.49 1.26e-07 6.28e-06 7.89
47 0.672 -0.103120 3.02 8.79e-03 2.20e-01 -3.45
28 0.701 0.069647 2.75 1.51e-02 2.33e-01 -3.98
48 -0.457 -0.029010 -2.58 2.13e-02 2.33e-01 -4.31
44 -0.532 0.000888 -2.53 2.33e-02 2.33e-01 -4.39
33 -0.394 0.239874 -2.42 2.88e-02 2.40e-01 -4.59
19 -0.332 -0.077768 -2.35 3.36e-02 2.40e-01 -4.74
34 0.486 0.032657 2.15 4.87e-02 3.04e-01 -5.08
11 0.298 -0.097927 1.67 1.17e-01 6.48e-01 -5.86
10 0.352 -0.013118 1.52 1.49e-01 7.45e-01 -6.07
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.