Description Usage Arguments Details Value References See Also Examples
These functions simulate two-sample microarray data from various different models.
1 2 3 4 5 6 7 8 9 |
ng |
number of genes |
n, n1, n2 |
number of samples per group; by default balanced, except for |
p0 |
proportion of differentially expressed genes |
D |
effect size for differentially expressed genes, in units of the gene-specific standard deviation ( |
sigma |
standard deviation, constant for all genes |
d0, s2_0, v0 |
prior parameters for effect size and variability across genes in Smyth's model, see Details. |
xdat, grp |
expression data and grouping variable for an existing microarray data set, as specified in |
replace |
logical switch indicating whether to sub-sample ( |
MAsim
simulates normal data with constant standard deviation sigma
across genes and fixed effect size D
; the sign of the effect is equally and randomly split between up- and down-regulation, and effects are added to the second group. MAsim.var
does the same, but instead of relying on a fixed variance across genes, it simulates gene-specific variances from a standard exponential distribution.
MAsim.smyth
simulates from the model suggested in Smyth (2004), using a normal error distribution. The variances are assumed to follow an inverse chisquared distribution with d0
degrees of freedom and are scaled by s2_0
; consequently, large values of d0
lead to similar gene-wise variances across genes, whereas small values lead to very different variances between genes. The effect sizes for differentially expressed genes are assumed to follow a normal distribution with mean zero and variance v0
times the previously simulated gene-specific variance; consequently, large values of v0
lead to large effects in the model.
MAsim.real
finally uses existing real or simulated existing data sets to generate simulated data with fixed effect sizes: for each group, the specified number of samples is sampled either with or without replacement from the columns of xdat
; for each gene, the group means are subtracted from the resampled data, so that the groupwise and overall mean for each gene is zero. Then, noise from an appropriate t-distribution is added to each group to break the sum-to-zero constraint in a consistent manner. The specified effect (evenly split between up- and down-regulation) for the differentially expressed genes is again added to the second group.
The functions all return a matrix with ng
rows and n1+n2
columns, except for MAsim.real
, where the default is to return a matrix of the same dimensions as xdat
. The group membership of each column is given by its column name. The matrix has additionally the attribute DE
, which is a logical vector specifying for each gene whether or not it was assumed to be differentially expressed in the simulation.
Smyth G (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # Small examples only
sim1 = MAsim(ng=1000, n=10, p0=0.8)
sim2 = MAsim.var(ng=1000, n1=15, n2=5, p0=0.8)
sim3 = MAsim.smyth(ng=1000, n=10, p0=0.8)
# Assess FDR
eoc1 = EOC(sim1, colnames(sim1), plot=FALSE)
eoc2 = EOC(sim2, colnames(sim2), plot=FALSE)
eoc3 = EOC(sim3, colnames(sim3), plot=FALSE)
# Show
par(mfrow=c(2,2))
plot(eoc1)
plot(eoc2)
plot(eoc3)
OCshow(eoc1, eoc2, eoc3)
# The truth will make you fret
table(eoc1$FDR<0.1, attr(sim1, "DE"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.