Simulate gene expression data.
Description
simulateGEdata
returns simulated noisy gene expression values of specified size
and its underlying genegene correlation.
Usage
1 2  simulateGEdata(n, m, k, size.alpha, corr.strength, g = NULL,
Sigma.eps = 0.1, nc, ne, intercept = TRUE, check.input = FALSE)

Arguments
n 
An integer setting the number of genes. 
m 
An integer setting the number of arrays. 
k 
An integer setting number of dimensions of noise term, controls dimension of W and α. 
size.alpha 
A numeric scalar giving the maximal and minimal absolute value of α. 
g 
An integer value between [1, min( 
corr.strength 
An integer controlling the dimension of X and β. 
Sigma.eps 
A numeric scalar setting the amount of random variation in
ε; 
nc 
An integer setting the number of negative controls. 
ne 
An integer setting the number of strongly expressed genes. 
intercept 
An logical value indicating whether the systematic noise has an intercept. 
check.input 
A logical scalar; if 
Details
This function generates log2transformed expression values of n
genes in
m
arrays. The expression values consist of true expression and noise:
Y=Xβ+Wα+ε
The dimensions of the matrices X and β are used to control the size of the correlation between the genes. It is possible to simualte three different classes of genes:
correlated genes expressed with true log2transformed values from 0 to 16
correlated genes expressed with true log2transformed values with mean 0
uncorrelated genes with true log2transformed expression equal to 0 (negative controls)
The negative control are always the last nc
genes in the data,
whereas the strongly expressed genes are always the first ne
genes in the data.
The parameter intercept
controls whether the systematic noise has an
offset or not. Note that the intercept is one dimension of W.
It is possible to either simulate data where W and X are independent by
setting g
to NULL, or increasing correlation bWX between
W and X by increasing g
.
Value
simulateGEdata
returns output of the class simulateGEdata
.
An object of class simulateGEdata
is a list
with the
following components:
Truth
A matrix containing the values of Xβ.Y
A matrix containing the values in Y.Noise
A matrix containing the values in Wα.Sigma
A matrix containing the true genegene correlations, as defined by Xβ.Info
A matrix containing some of the general information about the simulation.
Author(s)
Saskia Freytag, Johann GagnonBartsch
References
Jacob L., GagnonBartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).
Examples
1 2 3 4 5 6  Y<simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1,
250, 100, intercept=TRUE, check.input=TRUE)
Y
Y<simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1,
250, 100, intercept=TRUE, check.input=TRUE)
Y
