simDataSet: simDataSet - simulation of exemplary dataset

Description Usage Arguments Details Value Author(s) Examples

Description

A very simple interface to simulate a dataset containing ngen genes and nsam samples. Two groups are defined, drawn from normal distributions with different parameters.

Usage

1
2
simDataSet(nsam = 30, ngen = 100, mu1a = 1.2, mu1b = -0.5, mu2a = -1.2, mu2b = -1.4, 
			sigma = 1, plot = FALSE)

Arguments

nsam

Integer. Number of samples.

ngen

Integer. Number of genes.

mu1a

Double. Mean value of first subgroup of genes in the first sample group.

mu1b

Double. Mean value of second subgroup of genes in the first sample group.

mu2a

Double. Mean value of first group of genes in the second sample group.

mu2b

Double. Mean value of second group of genes in the second sample group.

sigma

Positive double. Common standard deviation for the informative genes.

plot

Boolean. Show a heatmap of the sampled data.

Details

Defines two sample groups to be classified. One third of the genes One third of the genes contain the information to classify sample group 1, another third the information to classify sample group2. In each gene group, two subgroups with differing intensity profiles are defined, to get complementary subgroups which in total define the respective sample group.

Value

A list with two elements:

logX

Log intensity values, samples in rows, features in columns.

groupings

List containing one element named grx, which hold the sample group assignment

Author(s)

Christian Bender.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
	## Not run: 
		my.seed <- 1234
		data <- simDataSet(ngen=100, nsam=30, plot=TRUE)
		
		
		

		## alternative way to sample data	
		my.seed <- runif(n=1, min=1, max=99999999)
		nsam <- 30 ## number of samples
		ngen <- 100 ## number of features
		nsig <- floor(ngen * .33)

		## use simdata from penalizedSVM package
		# 4. add 6 blocks of 5 genes each and only one significant gene 
		# per block. all genes in the block are correlated with constant
		# correlation factor corr.factor=0.8 		
		#train <- sim.data(n = nsam, ng = ngen, nsg = nsig, corr=TRUE, 
		#corr.factor=0.8, blocks=TRUE, n.blocks=6, nsg.block=1, ng.block=5, seed=my.seed )

		train <- sim.data(n = nsam, ng = ngen, nsg = nsig, corr=FALSE,  
						seed=my.seed, p.n.ratio=0.8) 

		logX <- t(train$x)
		groupings <- list(grx=train$y)

		drawheat(logX, groups=groupings[[1]])

	
## End(Not run)

bootfs documentation built on May 2, 2019, 5:50 p.m.