Home

/

CRAN

/

optBiomarker

/

simData: Simulation of microarray data

simData: Simulation of microarray data
In optBiomarker: Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/simData.R

The function simulates microarray data for two-group comparison with user supplied parameters such as number of biomarkers (genes or proteins), sample size, biological and experimental (technical) variation, replication, differential expression, and correlation between biomarkers.

simData(nTrain=100,
        nGr1=floor(nTrain/2),
        nBiom=50,nRep=3,
        sdW=1.0,
        sdB=1.0,
        rhoMax=NULL, rhoMin=NULL, nBlock=NULL,bsMin=3, bSizes=NULL, gamma=NULL,
        sigma=0.1,diffExpr=TRUE,
        foldMin=2,
        orderBiom=TRUE,
        baseExpr=NULL)

`nTrain`	Training set size,.i.e., the total number of biological samples in group 1 (`nGr1`) and group 2.
`nGr1`	Size of group 1. Defaults to `floor(nTrain/2)`.
`nBiom`	Number of biomarkers (genes, probes or proteins).
`nRep`	Number of technical replications.
`sdW`	Experimental (technical) variation (σ_e) of data in log (base 2) scale.
`sdB`	Biological variation (σ_b) of data in log (base 2) scale.
`rhoMax`	Maximum Pearson's correlation coefficient between biomarkers. To ensure positive definiteness, allowed values are restricted between 0 and 0.95 inclusive. If `NULL`, set to `runif(1,min=0.6,max=0.8)`.
`rhoMin`	Minimum Pearson's correlation coefficient between biomarkers. To ensure positive definiteness, allowed values are restricted between 0 and 0.95 inclusive. If `NULL`, set to `runif(1,min=0.2,max=0.4)`.
`nBlock`	Number of blocks in the block diagonal (Hub-Toeplitz) correlation matrix. If `NULL`, set to 1 for `nBiom<5` and randomly selected from `c(1:floor(nBiom/bsMin))` for `nBiom>=5`.
`bsMin`	Minimum block size. `bsMin=3` by default.
`bSizes`	A vector of length `nBlock` representing the block sizes (should sum to `nBlock`). If `NULL`, set to `c(bs+mod,rep(bs,nBlock-1)`, where `bs` is the integer part of `nBiom/nBlock` and `mod` is the remainder after integer division.
`gamma`	Specifies a correlation structure. If `NULL`, assumes independence.`gamma=0` indicates a single block exchangeable correlation marix with constant correlation `rho=0.5*(rhoMin+rhoMax)`. A value greater than zero indicates block diagonal (Hub-Toeplitz) correlation matrix with decline rate determined by the value of `gamma`. Decline rate is linear for `gamma=1`.
`sigma`	Standard deviation of the normal distribution (before truncation) where fold changes are generated from. See details.
`diffExpr`	Logical. Should systematic difference be introduced between the data of the two groups?
`foldMin`	Minimum value of fold changes. See details.
`orderBiom`	Logical. Should columns (biomarkers) be arranged in order of differential expression?
`baseExpr`	A vector of length `nBiom` to be used as base expressions μ. See `realBiomarker` for details.

Differential expressions are introduced by adding zδ to the data of group 2 where δ values are generated from a truncated normal distribution and z is randomly selected from (-1,1) to characterise up- or down-regulation.

Assuming that Y ~is~ N(μ, σ^2), and A=[a_1,a_2], a subset of -Inf <y < Inf, the conditional distribution of Y given A is called truncated normal distribution:

f(y, μ, σ)= (1/σ) φ((y-μ)/σ) / (Φ((a2-μ)/σ) - Φ((a_1-μ)/σ))

for a_1 <= y <= a_2, and 0 otherwise,

where μ is the mean of the original Normal distribution before truncation, σ is the corresponding standard deviation,a_2 is the upper truncation point, a_1 is the lower truncation point, φ(x) is the density of the standard normal distribution, and Φ(x) is the distribution function of the standard normal distribution. For simData function, we consider a_1=log_2(\code{foldMin}) and a_2=Inf. This ensures that the biomarkers are differentially expressed by a fold change of foldMin or more.

A dataframe of dimension nTrain by nBiom+1. The first column is a factor (class) representing the group memberships of the samples.

Mizanur Khondoker, Till Bachmann, Peter Ghazal
Maintainer: Mizanur Khondoker mizanur.khondoker@gmail.com.

Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.

classificationError

1	simData(nTrain=10,nBiom=3)

optBiomarker documentation built on Jan. 19, 2021, 1:06 a.m.

optBiomarker index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

optBiomarker
Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

simData: Simulation of microarray data
In optBiomarker: Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to simData in optBiomarker...

R Package Documentation

Browse R Packages

We want your feedback!

optBiomarker Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

simData: Simulation of microarray data In optBiomarker: Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to simData in optBiomarker...

R Package Documentation

Browse R Packages

We want your feedback!

optBiomarker
Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules

simData: Simulation of microarray data
In optBiomarker: Estimation of Optimal Number of Biomarkers for Two-Group Microarray Based Classifications at a Given Error Tolerance Level for Various Classification Rules