M3D_Simulations: Make Simulated Data
In tallulandrews/M3D: Michaelis-Menten Modelling of Dropouts in single-cell RNASeq

Description Usage Arguments Details Value References Examples

Makes simulated data based on a negative binomial distribution inflated with zeros based on the Michaelis-Menten equation.

	bg__MakeSimData(dispersion_fun=bg__default_mean2disp, n_cells=300, dispersion_factor=1, base_means=10^rnorm(25000, 1, 1), K=10.3)
	bg__MakeSimDE(dispersion_fun=bg__default_mean2disp, fold_change=10, frac_change=0.1, n_cells=300, sub_pop=0.5, dispersion_factor=1, base_means=10^rnorm(25000,1,1), K=10.3)
	bg__MakeSimDVar(dispersion_fun=bg__default_mean2disp, fold_change=10, frac_change=0.1, n_cells=300, sub_pop=0.5, dispersion_factor=1, base_means=10^rnorm(25000,1,1), K=10.3)
	bg__MakeSimHVar(dispersion_fun=bg__default_mean2disp, fold_change=10, frac_change=0.1, n_cells=300, dispersion_factor=1, base_means=10^rnorm(25000,1,1), K=10.3)

`dispersion_fun`	a function which takes mean experssion and returns the dispersion parameter of the negative binomial distribution.
`n_cells`	total number of cells (columns) in the simulated dataset.
`sub_pop`	proportion of cells with changed expression.
`frac_change`	proportion of genes with changed expression.
`fold_change`	fold change in dispersion or mean expression.
`dispersion_factor`	a factor that multiplies the calculated mean-specific dispersion for all genes.
`base_means`	a vector of background mean expression values.
`K`	K of the Michaelis-Menten function

Generates simulated single-cell gene expression data using a zero-inflated negative binomial distribution. A user-supplied function relates the dispersion parameter (1/size of the R parameterization of the negative binomial distribution). Zeros are added based on a Michaelis-Menten function.

Default values of base_means, K, and dispersion_fun were fit to the Buettner et al. 2015 data [1].

bg__MakeSimData generates simulated single-cell data for a single homogeneous population.

bg__MakeSimDE generates simulated single-cell data for two different populations where a proportion of genes have a fold_change difference in the mean for population "2".

bg__MakeSimDVar generates simulated single-cell data for two different populations where a proportion of genes have a fold_change difference in the dispersion for population "2".

bg__MakeSimHVar generates simulated single-cell data for a single homogeneous population where a proportion of genes have a fold_change increase in dispersion over the expectation given the mean expression of the gene.

bg__MakeSimData : a gene expression matrix where rows are genes, columns are cells. bg__MakeSimDE, bg__MakeSimDVar, bg__MakeSimHVar : a list of three named items: data : the gene expression matrix where rows are genes, columns are cells cell_labels : a vector of 1 or 2 indicating which cells are the unchanged ("1") or changed ("2") population. TP : a vector of row IDs of those genes that change (true positives).

[1] Buettner et al. (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature Biotechnology 33 : 155-160.

#  means = c(1,2,5,10,20,50,100,200,500,1000,2000,5000)
#  population1 <- bg__MakeSimData(n_cells=10, base_means=means)
#  population2 <- bg__MakeSimData(n_cells=10, base_means=means*2, dispersion_factor=0.5)
#  sim_DE <- bg__MakeSimDE(n_cells=100, base_means=means)
#  sim_DVar <- bg__MakeSimDVar(n_cells=100, sub_pop=0.25, base_means=means)
#  sim_HVar <- bg__MakeSimHVar(base_means=means, fold_change=3)