Forward simulation for differential expression.
Description
Forward simulation allows to evaluate the expected utility for sequential designs. Here the utility is the expected number of true discoveries minus a sampling cost. The routine simulates future data either from the prior predictive or using a set of pilot data and a GaGa or normalnormal model fit. At each future time point, it computes a summary statistic that will be used to determine when to stop the experiment.
Usage
1 2  forwsimDiffExpr(fit, x, groups, ngenes, maxBatch, batchSize, fdrmax = 0.05, genelimit, v0thre = 1, B = 100,
Bsummary = 100, trace = TRUE, randomSeed, mc.cores=1)

Arguments
fit 
Either GaGa or MiGaGa fit (object of type 
x 

groups 
If 
ngenes 
Number of genes to simulate data for. If 
maxBatch 
Maximum number of batches, i.e. the routine simulates

batchSize 
Batch size, i.e. number of observations per group to
simulate at each time point. Defaults to 
fdrmax 
Upper bound on FDR. 
genelimit 
Only the 
v0thre 
Only genes with posterior probability of being equally
expressed < 
B 
Number of forward simulations. 
Bsummary 
Number of simulations for estimating the summary statistic. 
trace 
For 
randomSeed 
Integer value used to set random number generator
seed. Defaults to 
mc.cores 
If 
Details
To improve computational speed hyperparameters are not reestimated as new data is simulated.
Value
A data.frame
with the following columns:
simid 
Simulation number. 
j 
Time (sample size). 
u 
Expected number of true positives if we were to stop experimentation at this time. 
fdr 
Expected FDR if we were to stop experimentation at this time. 
fnr 
Expected FNR if we were to stop experimentation at this time. 
power 
Expected power (as estimated by E(TP)/E(positives)) if we were to stop experimentation at this time. 
summary 
Summary statistic: increase in expected true positives if we were to obtain one more data batch. 
Author(s)
David Rossell.
References
Rossell D., Mueller P. Sequential sample sizes for highthroughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.
Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. Annals of Applied Statistics, 2009, 3, 10351051.
See Also
plotForwSim
to plot the simulated trajectories,
fitGG
for fitting a GaGa model,
fitNN
for fitting a normalnormal model,
seqBoundariesGrid
for finding the optimal design based
on the forwards simulation output.
powfindgenes
for fixed sample size calculations.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  #Simulate data and fit GaGa model
set.seed(1)
x < simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25)
gg1 < fitGG(x,groups=1:2,method='EM')
gg1 < parest(gg1,x=x,groups=1:2)
#Run forward simulation
fs1 < forwsimDiffExpr(gg1, x=x, groups=1:2,
maxBatch=2,batchSize=1,fdrmax=0.05, B=100, Bsummary=100, randomSeed=1)
#Expected number of true positives for each sample size
tapply(fs1$u,fs1$time,'mean')
#Expected utility for each sample size
samplingCost < 0.01
tapply(fs1$u,fs1$time,'mean')  samplingCost*(0:2)
#Optimal sequential design
b0seq < seq(0,20,length=200); b1seq < seq(0,40,length=200)
bopt <seqBoundariesGrid(b0=b0seq,b1=b1seq,forwsim=fs1,samplingCost=samplingCost,powmin=0)
bopt < bopt$opt
plot(fs1$time,fs1$summary,xlab='Additional batches',ylab='E(newly discovered DE genes)')
abline(bopt['b0'],bopt['b1'])
text(.2,bopt['b0'],'Continue',pos=3)
text(.2,bopt['b0'],'Stop',pos=1)
