simulateGEdata: Simulate gene expression data.

Description Usage Arguments Details Value Author(s) References Examples

View source: R/simulateGEdata.R

Description

simulateGEdata returns simulated noisy gene expression values of specified size and its underlying gene-gene correlation.

Usage

1
2
simulateGEdata(n, m, k, size.alpha, corr.strength, g = NULL,
  Sigma.eps = 0.1, nc, ne, intercept = TRUE, check.input = FALSE)

Arguments

n

An integer setting the number of genes.

m

An integer setting the number of arrays.

k

An integer setting number of dimensions of noise term, controls dimension of W and α.

size.alpha

A numeric scalar giving the maximal and minimal absolute value of α.

g

An integer value between [1, min(k, corr.strength)) giving the correlation between X and W or NULL for independence.

corr.strength

An integer controlling the dimension of X and β.

Sigma.eps

A numeric scalar setting the amount of random variation in ε; Sigma.eps >0.

nc

An integer setting the number of negative controls.

ne

An integer setting the number of strongly expressed genes.

intercept

An logical value indicating whether the systematic noise has an intercept.

check.input

A logical scalar; if TRUE all input is checked (not advisable for large simulations).

Details

This function generates log2-transformed expression values of n genes in m arrays. The expression values consist of true expression and noise:

Y=Xβ+Wα+ε

The dimensions of the matrices X and β are used to control the size of the correlation between the genes. It is possible to simualte three different classes of genes:

The negative control are always the last nc genes in the data, whereas the strongly expressed genes are always the first ne genes in the data. The parameter intercept controls whether the systematic noise has an offset or not. Note that the intercept is one dimension of W. It is possible to either simulate data where W and X are independent by setting g to NULL, or increasing correlation bWX between W and X by increasing g.

Value

simulateGEdata returns output of the class simulateGEdata. An object of class simulateGEdata is a list with the following components:

Author(s)

Saskia Freytag, Johann Gagnon-Bartsch

References

Jacob L., Gagnon-Bartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).

Examples

1
2
3
4
5
6
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1,
250, 100, intercept=TRUE, check.input=TRUE)
Y
Y<-simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1,
250, 100, intercept=TRUE, check.input=TRUE)
Y

Example output

Simulated Data:
Number of samples: [1] 500

Number of genes: [1] 500

Info:      [,1]               [,2]     
[1,] "k"                "10"     
[2,] "Mean correlation" "0.37499"
[3,] "Size alpha"       "2"      
[4,] "Intercept"        "1"      


 Truth
         [,1]     [,2]      [,3]      [,4]     [,5]
[1,] 3.596738 4.351550 11.004275 0.5466981 6.542468
[2,] 4.298099 4.094545 11.052006 0.8955395 7.334076
[3,] 4.443350 4.418265  9.998545 1.5739355 7.303750
[4,] 5.215774 3.631747 10.010910 1.6639748 7.802749
[5,] 3.836629 3.771450 10.768719 0.7645992 6.317064


 Y
          [,1]     [,2]     [,3]       [,4]     [,5]
[1,] 3.5231803 5.179990 9.204113 -3.1767918 7.902070
[2,] 6.1828126 8.214550 6.713868 -1.4375077 5.512597
[3,] 0.5298071 5.928962 6.408643 -0.1349181 6.425938
[4,] 7.3690851 2.237995 5.645383  4.1437731 8.462192
[5,] 7.9851463 4.746616 8.647306 -0.1213989 8.086546


 Noise
          [,1]       [,2]      [,3]       [,4]       [,5]
[1,] -0.127383  0.8269778 -1.794249 -3.7409284  1.2833435
[2,]  1.617846  4.1016028 -4.408982 -2.3602179 -1.7422442
[3,] -3.803051  1.4916220 -3.477907 -1.8216459 -0.8208796
[4,]  2.212492 -1.1836375 -4.380967  2.4938473  0.5860440
[5,]  4.056727  1.0001829 -2.315479 -0.8291611  1.8782218


 Sigma
            [,1]       [,2]        [,3]       [,4]        [,5]
[1,]  1.00000000 -0.5595883 -0.07298354  0.6322515  0.84847310
[2,] -0.55958832  1.0000000 -0.44865210 -0.4115904 -0.18730012
[3,] -0.07298354 -0.4486521  1.00000000 -0.3441086  0.05155539
[4,]  0.63225154 -0.4115904 -0.34410856  1.0000000  0.31123430
[5,]  0.84847310 -0.1873001  0.05155539  0.3112343  1.00000000
[1] "Need to make positive semi-definite!"
Simulated Data:
Number of samples: [1] 500

Number of genes: [1] 500

Info:      [,1]               [,2]     
[1,] "k"                "10"     
[2,] "Mean correlation" "0.37338"
[3,] "bWX"              "0.21217"
[4,] "Size alpha"       "2"      
[5,] "Intercept"        "1"      


 Truth
          [,1]      [,2]     [,3]     [,4]     [,5]
[1,] 2.2582291  6.619300 11.64836 7.306462 11.07899
[2,] 1.2507872  9.253436 12.89902 7.633731 12.19159
[3,] 1.8330933  8.985840 11.36333 7.040545 13.08401
[4,] 1.7006120  7.822357 13.15265 7.065259 10.66434
[5,] 0.8343513 10.536103 11.22436 5.457484 14.95629


 Y
          [,1]      [,2]     [,3]      [,4]      [,5]
[1,] 2.2263695  6.746727 12.29742  6.534019  9.667914
[2,] 3.1422112  7.906641 12.79606  7.391286  6.886018
[3,] 4.5375957  8.513955 12.02932 10.324045 10.036745
[4,] 2.7761689  4.894085 12.79016  6.752082 10.554699
[5,] 0.1543607 10.715974 14.26230  8.618280 18.083851


 Noise
           [,1]        [,2]       [,3]       [,4]        [,5]
[1,]  0.0755982  0.09930061  0.5105444 -0.6883459 -1.46154600
[2,]  1.8059627 -1.17788858 -0.1663120 -0.3911466 -5.18504405
[3,]  2.5355101 -0.60999502  0.5226616  3.1667417 -3.05178840
[4,]  1.1571317 -2.91538519 -0.4047063 -0.3526600 -0.05957667
[5,] -0.6660395  0.16885499  3.0141241  3.1310185  3.15525373


 Sigma
           [,1]        [,2]        [,3]       [,4]       [,5]
[1,]  1.0000000 -0.11951360 -0.45387329 -0.5051564 -0.3511236
[2,] -0.1195136  1.00000000  0.03998117 -0.2087242  0.6125461
[3,] -0.4538733  0.03998117  1.00000000  0.3589248 -0.4463913
[4,] -0.5051564 -0.20872421  0.35892481  1.0000000 -0.1491543
[5,] -0.3511236  0.61254611 -0.44639129 -0.1491543  1.0000000

RUVcorr documentation built on Nov. 17, 2017, 11:05 a.m.