SimulatedDataGenerator: Simulation dataset generator
In BANFF: Bayesian Network Feature Finder

Description Usage Arguments Details Value Examples

Function used to generating simulated dataset. See details in simulation studies

SimulatedDataGenerator(net=NULL,nnode=NULL,maxpernull=0.7,class.label=NULL,
missloc=NULL,missing=c(FALSE,TRUE),missrate=0.1,nonmiss.hub.maxedges=7,
maxnsteps.merge.communties=1000,dist=c("norm","gamma","lognorm"),
plot=c(TRUE,FALSE),nbin=c(20,20,20),rng=1024)

`net`	The adjacent matrix with 0/1 indicating "connected" or "not directly connected. If not given, generate a scale-free graphs according to the Barabasi-Albert model by applying the BA algorithm in igraph package.
`nnode`	Integer. Total number of gene nodes in network.
`maxpernull`	float. Max percent of null genes in the network. Used when class.label is not given and needs to be generated during merging process. Default=0.7
`class.label`	Vector of length(total number of nodes), giving the class indicators: -1, 0, 1 to each of the gene node. If not given, class labels are defined based on fast.greedy community detection algorithm, then merged to three sequentially based on the number of between-community-edges. Highly connected communited are merged to one first. Then the largest communities are assigned class indicator 0 as null genes. The up/down regulated class are assigned randomly.
`missloc`	Vector. Default NULL. If given, it is the location of the test statistics that is not been observed.
`missing`	Logical. Default FALSE. If TRUE, the missing location are generated based on missing rate.
`missrate`	A number between (0,1). The missing rate defined as the proportion of gene nodes without observed test statistics. Not recommend over 20% based on biological knowledge.
`nonmiss.hub.maxedges`	Integer. Based on biological knowledge, hub genes (with higher number of neighboring edges) are less likely to be missing gene nodes. Thus it is the cutoff value where only genes with less than the nonmiss.hub.maxedges neighbors can be assigned as missing genes. Default=7
`maxnsteps.merge.communties`	Integer. The maximum number of steps used for merging the small communities. In order to be 3, defaul=1000.
`dist`	Char. The distribution of DE genes, can be one of the following: c("norm","gamma","lognorm"). See details in simulation design table.
`plot`	Logical. Defaul=TRUE: whether to plot the histogram of test statistics being generated or not.
`nbin`	Vector of length 3. Default=c(5,20,5). The number of bins used for ploting the histogram for each of the class.
`rng`	Random seed Defaul=1024

The function used for simulating test statistics:

network is given or generated by Barabasi-Albert algorithm in igraph package.
class indicators is given or generated based on fast.greedy community detection algorithm. #'
test statistics, currently support three simulation scenario: c("norm","gamma","lognorm")

A list:

`testcov`	test statistics, missing observations are coded as NA if any
`testcov.fullobs`	test statistics when all the observations are fully observed
`class.label`	z values for each gene, class indicators
`net`	simulated network, binary adjacency matrix 1/0 connected or not

## Not run: 
## The simulation settings based on real gene network. (takes time)
data(net)
data(class.label)
data(missloc)
simdata=SimulatedDataGenerator(net=net,class.label=class.label,missloc=missloc,
dist="norm",plot=TRUE,nbin=c(20,20,20),rng=1024)
str(simdata)
## A toy example
simdata=SimulatedDataGenerator(nnode=100,missing=TRUE,missrate=0.1,dist="norm",
plot=TRUE,nbin=c(20,20,10),rng=1024)
str(simdata)

## End(Not run)