fitmixture: Fit Mixture Model by Non-Linear Least Squares
In limma: Linear Models for Microarray Data

Description Usage Arguments Details Value Author(s) References Examples

Fit Mixture Model by Non-Linear Least Squares

1	fitmixture(log2e, mixprop, niter = 4, trace = FALSE)

`log2e`	a numeric matrix containing log2 expression values. Rows correspond to probes for genes and columns to RNA samples.
`mixprop`	a vector of length `ncol(log2e)` giving the mixing proportion (between 0 and 1) for each sample.
`niter`	integer number of iterations.
`trace`	logical. If `TRUE`, summary working estimates are output from each iteration.

A mixture experiment is one in which two reference RNA sources are mixed in different proportions to create experimental samples. Mixture experiments have been used to evaluate genomic technologies and analysis methods (Holloway et al, 2006). This function uses all the data for each gene to estimate the expression level of the gene in each of two pure samples.

The function fits a nonlinear mixture model to the log2 expression values for each gene. The expected values of log2e for each gene are assumed to be of the form log2( mixprop*Y1 + (1-mixprop)*Y2 ) where Y1 and Y2 are the expression levels of the gene in the two reference samples being mixed. The mixprop values are the same for each gene but Y1 and Y2 are specific to the gene. The function returns the estimated values A=0.5*log2(Y1*Y2) and M=log2(Y2/Y1) for each gene.

The nonlinear estimation algorithm implemented in fitmixture uses a nested Gauss-Newton iteration (Smyth, 1996). It is fully vectorized so that the estimation is done for all genes simultaneously.

List with three components:

`A`	numeric vector giving the estimated average log2 expression of the two reference samples for each gene
`M`	numeric vector giving estimated log-ratio of expression between the two reference samples for each gene
`stdev`	standard deviation of the residual term in the mixture model for each gene

Gordon K Smyth

Holloway, A. J., Oshlack, A., Diyagama, D. S., Bowtell, D. D. L., and Smyth, G. K. (2006). Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis. BMC Bioinformatics 7, Article 511. http://www.biomedcentral.com/1471-2105/7/511

Smyth, G. K. (1996). Partitioned algorithms for maximum likelihood and other nonlinear estimation. Statistics and Computing, 6, 201-216. http://www.statsci.org/smyth/pubs/partitio.pdf

ngenes <- 100
TrueY1 <- rexp(ngenes)
TrueY2 <- rexp(ngenes)
mixprop <- matrix(c(0,0.25,0.75,1),1,4)
TrueExpr <- TrueY1 

log2e <- log2(TrueExpr) + matrix(rnorm(ngenes*4),ngenes,4)*0.1
out <- fitmixture(log2e,mixprop)

# Plot true vs estimated log-ratios
plot(log2(TrueY1/TrueY2), out$M)