Function for normalizing data, fitting a normal-uniform mixture and estimating probabilities of differential expression in the case where the two samples are being compared directly

Description

After a mean and variance normalization, a two component mixture model is fitted to the data. The normal component represents the genes that are not differentially expressed and the uniform component represents the genes that are differentially expressed. Posterior probabilities for differential expression are computed from the fitted model.

Usage

1
2
nudge1(logratio, logintensity, dye.swap = FALSE, span1 = 0.6, span2 = 0.2,
quant = 0.99, z = NULL, tol = 0.00001,iterlim=500)

Arguments

logratio

A matrix or vector of log (base 2) ratios of intensity expressions in 2 samples, with rows indexing genes and columns (if necessary) indexing replicates.

logintensity

A matrix or vector of total log (base 2) total intensities (defined as the product) of intensity expressions in 2 samples, with rows indexing genes and columns (if necessary) indexing replicates.

dye.swap

A logical value indicating whether or not the data is from a balanced dye-swap. Only used for multiple replicate experiments.

span1

Proportion of data used to fit the loess regression of the (average-across-replicates) log ratios on the (average-across-replicates) log total intensities for the mean normalization.

span2

Proportion of data used to fit the loess regression of the absolute (mean normalized) log ratios on the log total intensities for the variance normalization. Only used for single replicate experiments.

quant

Quantile to be used from the distribution of standard deviations of log ratios across replicates for all genes whose standard deviation was smaller than their absolute (mean normalized) average-across-replicates log ratio. Only used for multiple replicate experiments.

z

An optional 2-column matrix with each row giving a starting estimate for the probability of the gene (in the corresponding row of the log ratio matrix/vector) not being differentially expressed and a starting estimate for the probability of the gene being differentially expressed. Each row should add up to 1.

tol

A scalar tolerance for relative convergence of the loglikelihood.

iterlim

The maximum number of iterations the EM is run for.

Details

A balanced dye swap is where a certain number of replicates have a particular dye to sample assigment and the same number of other replicates have the reversed assignment. Note in this case log ratios should be taken with numerators being the same sample and denominators the other sample, i.e. ratios should always be sample i/sample j rather than red dye/green dye for all replicates.

Value

A list including the following components

pdiff

A vector with the estimated posterior probabilities of being in the group of differentially expressed genes.

lRnorm

A vector with the normalized (average-across-replicates) log ratios.

mu

The estimated mean of the group of genes that are not differentially expressed.

sigma

The estimated variance of the group of genes that are not differentially expressed.

mixprob

The prior/mixing probability of a gene being in the group of genes that are not differentially expressed.

a

The minimum value of the normalized data.

b

The maximum value of the normalized data.

loglike

The log likelihood for the fitted mixture model.

iter

The number of iterations run by the EM algorithm until either convergence or iteration limit was reached.

Author(s)

N. Dean and A. E. Raftery

References

N. Dean and A. E. Raftery (2005). Normal uniform mixture differential gene expression detection for cDNA microarrays. BMC Bioinformatics. 6, 173-186.

http://www.biomedcentral.com/1471-2105/6/173

S. Dudoit, Y. H. Yang, M. Callow and T. Speed (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12, 111-139.

See Also

nudge2,norm1a,norm1b,norm1c,norm1d,norm2c,norm2d

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(like)
lR<-log(like[,1],2)-log(like[,2],2)
lI<-log(like[,1],2)+log(like[,2],2)

result<-nudge1(lR,lI)

data(hiv)
lR<-log(hiv[,1:4],2)-log(hiv[,5:8],2)
lI<-log(hiv[,1:4],2)+log(hiv[,5:8],2)

result<-nudge1(lR,lI,dye.swap=TRUE)