Description Usage Arguments Details Value Author(s) References See Also Examples

Estimate the correlation between duplicate spots (regularly spaced replicate spots on the same array) or between technical replicates from a series of arrays.

1 2 | ```
duplicateCorrelation(object, design=NULL, ndups=2, spacing=1, block=NULL,
trim=0.15, weights=NULL)
``` |

`object` |
a numeric matrix of expression values, or any data object from which |

`design` |
the design matrix of the microarray experiment, with rows corresponding to arrays and columns to comparisons to be estimated. The number of rows must match the number of columns of |

`ndups` |
a positive integer giving the number of times each gene is printed on an array. |

`spacing` |
the spacing between the rows of |

`block` |
vector or factor specifying a blocking variable |

`trim` |
the fraction of observations to be trimmed from each end of |

`weights` |
an optional numeric matrix of the same dimension as |

When `block=NULL`

, this function estimates the correlation between duplicate spots (regularly spaced within-array replicate spots).
If `block`

is not null, this function estimates the correlation between repeated observations on the blocking variable.
Typically the blocks are biological replicates and the repeated observations are technical replicates.
In either case, the correlation is estimated by fitting a mixed linear model by REML individually for each gene.
The function also returns a consensus correlation, which is a robust average of the individual correlations, which can be used as input for
functions `lmFit`

or `gls.series`

.

At this time it is not possible to estimate correlations between duplicate spots and between technical replicates simultaneously.
If `block`

is not null, then the function will set `ndups=1`

, which is equivalent to ignoring duplicate spots.

For this function to return statistically useful results, there must be at least two more arrays than the number of coefficients to be estimated, i.e., two more than the column rank of `design`

.

The function may take long time to execute as it fits a mixed linear model for each gene for an iterative algorithm. It is not uncommon for the function to return a small number of warning messages that correlation estimates cannot be computed for some individual genes. This is not a serious concern providing that there are only a few such warnings and the total number of genes is large. The consensus estimator computed by this function will not be materially affected by a small number of genes.

A list with components

`consensus.correlation` |
the average estimated inter-duplicate correlation. The average is the trimmed mean of the individual correlations on the atanh-transformed scale. |

`cor` |
same as |

`atanh.correlations` |
numeric vector of length |

Gordon Smyth

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. *Bioinformatics* 21(9), 2067-2075.
[http://bioinformatics.oxfordjournals.org/content/21/9/2067]
[Preprint with corrections: http://www.statsci.org/smyth/pubs/dupcor.pdf]

These functions use `mixedModel2Fit`

from the statmod package.

An overview of linear model functions in limma is given by 06.LinearModels.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ```
# Simulate gene expression data for 100 probes and 6 microarrays
# Microarray are in two groups
# First two probes are more highly expressed in second group
# Std deviations vary between genes with prior df=4
sd <- 0.3*sqrt(4/rchisq(100,df=4))
y <- matrix(rnorm(100*6,sd=sd),100,6)
rownames(y) <- paste("Gene",1:100)
y[1:2,4:6] <- y[1:2,4:6] + 2
design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))
options(digits=3)
# Fit with correlated arrays
# Suppose each pair of arrays is a block
block <- c(1,1,2,2,3,3)
dupcor <- duplicateCorrelation(y,design,block=block)
dupcor$consensus.correlation
fit1 <- lmFit(y,design,block=block,correlation=dupcor$consensus)
fit1 <- eBayes(fit1)
topTable(fit1,coef=2)
# Fit with duplicate probes
# Suppose two side-by-side duplicates of each gene
rownames(y) <- paste("Gene",rep(1:50,each=2))
dupcor <- duplicateCorrelation(y,design,ndups=2)
dupcor$consensus.correlation
fit2 <- lmFit(y,design,ndups=2,correlation=dupcor$consensus)
dim(fit2)
fit2 <- eBayes(fit2)
topTable(fit2,coef=2)
``` |

```
[1] -0.263
logFC AveExpr t P.Value adj.P.Val B
Gene 1 2.215 1.18299 11.86 2.03e-06 0.000203 5.59
Gene 2 1.682 1.14910 9.95 7.79e-06 0.000390 4.17
Gene 37 0.565 -0.01096 3.64 6.43e-03 0.193192 -3.03
Gene 5 0.620 -0.01629 3.50 7.91e-03 0.193192 -3.25
Gene 74 -2.056 -0.62548 -3.36 9.66e-03 0.193192 -3.46
Gene 59 0.559 -0.01790 3.06 1.52e-02 0.217546 -3.93
Gene 25 0.536 0.00804 3.06 1.52e-02 0.217546 -3.93
Gene 91 -0.430 0.25850 -2.77 2.39e-02 0.298949 -4.40
Gene 54 -0.659 0.06814 -2.63 2.96e-02 0.307573 -4.61
Gene 87 -0.471 0.01058 -2.61 3.08e-02 0.307573 -4.65
[1] -0.0513
[1] 50 2
logFC AveExpr t P.Value adj.P.Val B
1 1.929 1.16604 13.84 7.46e-10 3.73e-08 12.94
19 0.277 -0.02002 2.12 5.13e-02 8.33e-01 -5.50
44 -0.261 0.02744 -1.92 7.39e-02 8.33e-01 -5.83
49 0.314 -0.00262 1.92 7.49e-02 8.33e-01 -5.85
9 -0.209 -0.05778 -1.71 1.08e-01 8.33e-01 -6.17
21 0.242 -0.06537 1.59 1.33e-01 8.33e-01 -6.35
37 -0.884 -0.27590 -1.59 1.34e-01 8.33e-01 -6.35
30 0.307 -0.08700 1.50 1.56e-01 8.33e-01 -6.48
7 0.256 -0.04576 1.28 2.21e-01 8.33e-01 -6.77
26 0.158 -0.06626 1.24 2.36e-01 8.33e-01 -6.82
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.