generateData: Simulator for gene expression data

Description Usage Arguments Details Value Author(s) Examples

View source: R/generateData.R

Description

A simulator for gene expression data, whose values are normally distributed values with zero mean. The covariances are given by a configurable block-diagonal matrix. By default, half of the samples contain differential gene expression values (see parameter diffsamples).

Usage

1
generateData(samples=50, genes=10000, diffgenes=200, blocksize=50, cov1=0.2, cov2=0, diff=0.6, diffsamples)

Arguments

samples

number of samples

genes

number of gene expression values per sample

diffgenes

number of differential genes for class 1

blocksize

size of each block in the blockdiagonal correlation matrix

cov1

covariance within the blocks in the correlation matrix

cov2

covariance between the blocks in the correlation matrix

diff

difference between the random gene expression values and the differential gene expression values

diffsamples

number of samples containing differential gene expression values compared to the rest (if missing, this parameter is set to half of the total number of samples)

Details

The simulator generates two labeled classes:
label 1: samples with differentially expressed genes.
label -1: samples without differentially expressed genes.

Value

'generateData' returns a list containing:

data

a (samples x features)-matrix with the simulated gene expression values

labels

a vector with labels (1,-1) for the two classes

Author(s)

Christoph Bartenhagen

Examples

1
2
3
4
## generate a dataset with 20 samples and 1.000 gene expression values
d = generateData(samples=20, genes=1000, diffgenes=100, blocksize=10)
data = d[[1]]
labels = d[[2]]

RDRToolbox documentation built on Nov. 8, 2020, 11:10 p.m.