simulateGEdata: Simulate gene expression data.
In RUVcorr: Removal of unwanted variation for gene-gene correlations and related analysis

Description Usage Arguments Details Value Author(s) References Examples

simulateGEdata returns simulated noisy gene expression values of specified size and its underlying gene-gene correlation.

simulateGEdata(
  n,
  m,
  k,
  size.alpha,
  corr.strength,
  g = NULL,
  Sigma.eps = 0.1,
  nc,
  ne,
  intercept = TRUE,
  check.input = FALSE
)

`n`	An integer setting the number of genes.
`m`	An integer setting the number of arrays.
`k`	An integer setting number of dimensions of noise term, controls dimension of W and α.
`size.alpha`	A numeric scalar giving the maximal and minimal absolute value of α.
`corr.strength`	An integer controlling the dimension of X and β.
`g`	An integer value between [1, min(`k`, `corr.strength`)) giving the correlation between X and W or `NULL` for independence.
`Sigma.eps`	A numeric scalar setting the amount of random variation in ε; `Sigma.eps` >0.
`nc`	An integer setting the number of negative controls.
`ne`	An integer setting the number of strongly expressed genes.
`intercept`	An logical value indicating whether the systematic noise has an intercept.
`check.input`	A logical scalar; if `TRUE` all input is checked (not advisable for large simulations).

This function generates log2-transformed expression values of n genes in m arrays. The expression values consist of true expression and noise:

Y=Xβ+Wα+ε

The dimensions of the matrices X and β are used to control the size of the correlation between the genes. It is possible to simualte three different classes of genes:

correlated genes expressed with true log2-transformed values from 0 to 16
correlated genes expressed with true log2-transformed values with mean 0
uncorrelated genes with true log2-transformed expression equal to 0 (negative controls)

The negative control are always the last nc genes in the data, whereas the strongly expressed genes are always the first ne genes in the data. The parameter intercept controls whether the systematic noise has an offset or not. Note that the intercept is one dimension of W. It is possible to either simulate data where W and X are independent by setting g to NULL, or increasing correlation bWX between W and X by increasing g.

simulateGEdata returns output of the class simulateGEdata. An object of class simulateGEdata is a list with the following components:

Truth A matrix containing the values of Xβ.
Y A matrix containing the values in Y.
Noise A matrix containing the values in Wα.
Sigma A matrix containing the true gene-gene correlations, as defined by Xβ.
Info A matrix containing some of the general information about the simulation.

Saskia Freytag, Johann Gagnon-Bartsch

Jacob L., Gagnon-Bartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).

Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 
250, 100, intercept=TRUE, check.input=TRUE)
Y
Y<-simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1, 
250, 100, intercept=TRUE, check.input=TRUE)
Y

Simulated Data:
Number of samples: [1] 500

Number of genes: [1] 500

Info:      [,1]               [,2]     
[1,] "k"                "10"     
[2,] "Mean correlation" "0.37241"
[3,] "Size alpha"       "2"      
[4,] "Intercept"        "1"      


 Truth
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 7.871197 4.889551 5.677200 8.544979 12.02702
[2,] 8.212780 5.367559 5.669174 9.108905 11.57716
[3,] 8.545702 4.081319 6.501640 9.271822 11.65449
[4,] 7.704116 3.734744 5.953544 8.907272 12.25981
[5,] 7.068190 3.529673 6.223159 8.028994 13.22916


 Y
          [,1]     [,2]     [,3]      [,4]      [,5]
[1,]  8.522564 7.425374 9.302611  6.528299 10.475548
[2,] 10.632893 5.722616 8.347414  7.651381 13.067656
[3,]  4.406915 3.453922 5.230739 12.213992  8.632391
[4,]  6.230722 1.322824 3.316999  8.467973 11.096278
[5,]  4.863746 3.936759 7.424650  9.727948 11.577667


 Noise
           [,1]       [,2]      [,3]       [,4]       [,5]
[1,]  0.5250249  2.5660457  3.525673 -1.9150044 -1.6165503
[2,]  2.3459019  0.2418962  2.673213 -1.5466805  1.5653859
[3,] -4.0252557 -0.5981621 -1.388288  2.8660311 -3.0897664
[4,] -1.3681399 -2.4450126 -2.722507 -0.3141522 -0.9582918
[5,] -2.2009785  0.4936488  1.128588  1.7434271 -1.6770490


 Sigma
           [,1]       [,2]        [,3]       [,4]        [,5]
[1,]  1.0000000  0.1211910  0.42832666  0.7373949 -0.84067291
[2,]  0.1211910  1.0000000 -0.50707709  0.2125079 -0.42150937
[3,]  0.4283267 -0.5070771  1.00000000 -0.1501479  0.06967052
[4,]  0.7373949  0.2125079 -0.15014794  1.0000000 -0.85346754
[5,] -0.8406729 -0.4215094  0.06967052 -0.8534675  1.00000000
[1] "Need to make positive semi-definite!"
Simulated Data:
Number of samples: [1] 500

Number of genes: [1] 500

Info:      [,1]               [,2]     
[1,] "k"                "10"     
[2,] "Mean correlation" "0.37363"
[3,] "bWX"              "0.21706"
[4,] "Size alpha"       "2"      
[5,] "Intercept"        "1"      


 Truth
         [,1]     [,2]      [,3]       [,4]     [,5]
[1,] 7.685289 14.32489 1.7482706 -0.1788149 7.108335
[2,] 9.691697 12.03717 0.5474211  2.4437398 5.457974
[3,] 8.523230 10.37530 1.5113572  1.7656840 5.304194
[4,] 5.681972 10.83979 3.9880589  1.5487672 7.946191
[5,] 7.254459 12.34322 1.9604205  1.4191544 7.213939


 Y
          [,1]      [,2]      [,3]       [,4]      [,5]
[1,] 12.641294 14.928314 3.3322198 -0.3742960  9.575714
[2,]  9.256612 12.843538 0.6538541  1.1828698 11.065875
[3,]  8.809616  8.549482 0.1069418  6.1059918  8.268045
[4,]  7.490182  8.596770 0.1676217  0.4330974  2.571545
[5,]  3.371889  9.783125 1.9372984  4.4762190  9.824218


 Noise
           [,1]       [,2]        [,3]       [,4]      [,5]
[1,]  5.0258996  0.6143320  1.55337867 -0.2104603  2.486353
[2,] -0.5486304  0.9534514  0.15485342 -1.2987612  5.353293
[3,]  0.4418108 -1.8178774 -1.44664290  4.3072942  2.834994
[4,]  1.8797320 -2.2346331 -3.76903458 -1.1045356 -5.444094
[5,] -3.9478810 -2.4716343  0.04311142  3.0328758  2.608623


 Sigma
           [,1]       [,2]       [,3]        [,4]        [,5]
[1,]  1.0000000 -0.4120831 -0.9218970  0.66794810 -0.53410328
[2,] -0.4120831  1.0000000  0.2529908 -0.46955407  0.52630296
[3,] -0.9218970  0.2529908  1.0000000 -0.77364355  0.20632227
[4,]  0.6679481 -0.4695541 -0.7736436  1.00000000  0.08998119
[5,] -0.5341033  0.5263030  0.2063223  0.08998119  1.00000000