sim.dat: Simulating a Microarray Data Set

Description Usage Arguments Details Value Author(s) References Examples

View source: R/sim.dat.R

Description

This function simulates a two-group comparison microarray data set according to a hierarchical model, where the standardized effect sizes across all genes are assumed to be independently and identically distributed. This distribution is a two-component mixture. It has probability pi0 of being zero; and probability 1-pi0 of being from another distribution. The observed values are simulated independently conditional on the standardized effect sizes.

Usage

1
2
3
sim.dat(G = 10000, pi0 = 0.75, gamma2 = 1, n1 = 5, n2 = n1, 
        errdist = rnorm, effdist = function(g, gamma2) 
        rnorm(g, , sqrt(gamma2)), ErrArgs, EffArgs)

Arguments

G

a numeric positive integer, the number of genes

pi0

a numeric value between 0 and 1, the proportion of non-differentially expressed genes.

gamma2

a positive value, which is always the second argument passed to effdist. If the nonzero standardized effect sizes have a zero normal distribution, this is the variance of this distribution. The larger it is, the larger the mean absolute effects are.

n1

a positive integer, the sample size in treatment group 1.

n2

a positive integer, the sample size in treatment group 2.

errdist

a function, which simulate K random errors, where K is the first argument of errdist. The second argument is always ErrArgs, if it is not missing.

effdist

a function, which simulate G1 standardized effect sizes, where G1 is the first argument of effdist. The second argument is always gamma2. The third argument is always EffArgs, if it is not missing.

ErrArgs

a list of additional arguments used by errdist.

EffArgs

a list of additional arguments used by effdist.

Details

The funciton simulates G*N errors according to errdist, where N=n1+n2. The results are organized into a G-by-N matrix. The G1 standarized effect sizes are simulated according to effdist, controlled by the parameter gamma2, where \code{G1=round(G*pi0)}. Then, each column of the upper-left G1-by-n1 submatrix were added by the simulated effect sizes.

Value

a G-by-(n1+n2) matrix.

Author(s)

Long Qu

References

Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.

Examples

1
2
3
4
5
6
7
set.seed(54457704)
## an unusually small data set of 20 genes and 3 samples in each of the two treatment groups. 
dat=sim.dat(G=20, n1=3,n2=3)

set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)

pi0 documentation built on July 9, 2017, 9:01 a.m.