powfindgenes: Power computations for differential expression
In gaga: GaGa hierarchical model for high-throughput data analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

powfindgenes evaluates the posterior expected number of true positives (e.g. true gene discoveries) if one were to obtain an additional batch of data. It uses either a GaGa or a normal-normal model fit on a pilot data set.

1 2	powfindgenes(fit, x, groups, batchSize = 1, fdrmax = 0.05, genelimit, v0thre = 1, B = 1000, mc.cores=1)

`fit`	GaGa/MiGaGa or normal-normal model fit using pilot data `x`. It must be an object either of type `gagafit` (see `fitGG`) or `nnfit` (see `fitNN`).
`x`	`ExpressionSet`, `exprSet`, data frame or matrix containing the gene expression measurements used to fit the model.
`groups`	If `x` is of type `ExpressionSet` or `exprSet`, `groups` should be the name of the column in `pData(x)` with the groups that one wishes to compare. If `x` is a matrix or a data frame, `groups` should be a vector indicating to which group each column in x corresponds to.
`batchSize`	Number of additional samples to obtain per group.
`fdrmax`	Upper bound on FDR.

`genelimit`	Only the `genelimit` genes with the lowest probability of being equally expressed across all groups will be simulated. Setting this limit can significantly increase the computational speed.
`v0thre`	Only genes with posterior probability of being equally expressed < `v0thre` will be simulated. Setting this limit can significantly increase the computational speed.
`B`	Number of simulations from the GaGa predictive distribution to be used to estimate the posterior expected number of true positives.
`mc.cores`	If `multicore` package is available, `mc.cores` indicates the number of cores to use for parallel computing. Currently only used when `fit` is of class `nnfit`.

The routine simulates data from the posterior predictive distribution of a GaGa or normal-normal model. That is, first it simulates parameter values (differential expression status, mean expression levels etc.) from the posterior distribution. Then it simulates data using the parameter values drawn from the posterior. Finally the simulated data is used to determine the differential status of each gene, controlling the Bayesian FDR at the fdrmax level, as implemented in findgenes. As the differential expression status is known for each gene, one can evaluate the number of true discoveries in the reported gene list.

In order to improve speed, hyper-parameters are not re-estimated when computing posterior probabilities for the posterior predictive simulated data.

`m`	Posterior expected number of true positives (as estimated by the sample mean of `B` simulations)
`s`	Standard error of the estimate i.e. SD of the simulations/sqrt(B)

David Rossell

Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. http://rosselldavid.googlepages.com.

findgenes, fitGG, fitNN, parest. See powclasspred for power calculations for sample classification.

#Simulate data and fit GaGa model
set.seed(1)
x <- simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25)
gg1 <- fitGG(x,groups=1:2,method='EM')
gg1 <- parest(gg1,x=x,groups=1:2)

#Expected nb of TP for 1 more sample per group
powfindgenes(gg1,x=x,groups=1:2,batchSize=1,fdrmax=.05)$m

#Expected nb of TP for 10 more samples per group
powfindgenes(gg1,x=x,groups=1:2,batchSize=10,fdrmax=.05)$m