sim.dat: Simulating a Microarray Data Set
In pi0: Estimating the Proportion of True Null Hypotheses for FDR

Description Usage Arguments Details Value Author(s) References Examples

This function simulates a two-group comparison microarray data set according to a hierarchical model, where the standardized effect sizes across all genes are assumed to be independently and identically distributed. This distribution is a two-component mixture. It has probability pi0 of being zero; and probability 1-pi0 of being from another distribution. The observed values are simulated independently conditional on the standardized effect sizes.

1
2
3

sim.dat(G = 10000, pi0 = 0.75, gamma2 = 1, n1 = 5, n2 = n1, 
        errdist = rnorm, effdist = function(g, gamma2) 
        rnorm(g, , sqrt(gamma2)), ErrArgs, EffArgs)

`G`	a numeric positive integer, the number of genes
`pi0`	a numeric value between 0 and 1, the proportion of non-differentially expressed genes.
`gamma2`	a positive value, which is always the second argument passed to `effdist`. If the nonzero standardized effect sizes have a zero normal distribution, this is the variance of this distribution. The larger it is, the larger the mean absolute effects are.
`n1`	a positive integer, the sample size in treatment group 1.
`n2`	a positive integer, the sample size in treatment group 2.
`errdist`	a function, which simulate `K` random errors, where `K` is the first argument of `errdist`. The second argument is always `ErrArgs`, if it is not missing.
`effdist`	a function, which simulate `G1` standardized effect sizes, where `G1` is the first argument of `effdist`. The second argument is always `gamma2`. The third argument is always `EffArgs`, if it is not missing.
`ErrArgs`	a list of additional arguments used by `errdist`.
`EffArgs`	a list of additional arguments used by `effdist`.

The funciton simulates G*N errors according to errdist, where N=n1+n2. The results are organized into a G-by-N matrix. The G1 standarized effect sizes are simulated according to effdist, controlled by the parameter gamma2, where \code{G1=round(G*pi0)}. Then, each column of the upper-left G1-by-n1 submatrix were added by the simulated effect sizes.

a G-by-(n1+n2) matrix.

Long Qu

Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.

set.seed(54457704)
## an unusually small data set of 20 genes and 3 samples in each of the two treatment groups. 
dat=sim.dat(G=20, n1=3,n2=3)

set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)