estimateNBParam: Estimate simulation parameters
In bvieth/powsim: Power Simulations

Description Usage Arguments Value Author(s) Examples

This function estimates and returns parameters needed for power simulations assuming a negative binomial read count distribution.

estimateNBParam(countData, cData=NULL, design=NULL,
RNAseq=c("bulk", "singlecell"),
estFramework=c('edgeR', 'DESeq2', 'MatchMoments'),
sigma=1.96)

`countData`	is a count `matrix`. Rows correspond to genes, columns to samples.
`cData`	A `data.frame` with at least a single column. Rows of `colData` must correspond to columns of `countData`. Default is `NULL`, i.e. the dispersion estimation is 'blind' to sample information, see `varianceStabilizingTransformation`.
`design`	A `formula` which expresses how the counts for each gene depend on the variables in colData. Designs with multiple covariates and/or interactions are supported. Default is `NULL`, i.e. no design information is considered.
`RNAseq`	is a character value: "bulk" or "singlecell".
`estFramework`	is a character value: "edgeR", "DESeq2" and "MatchMoments". "edgeR" or "DESeq2" employs the edgeR or DESeq2 style mean and dispersion estimation, respectively. For details, please consult `estimateDisp` and `estimateDispersions`. "MatchMoments" employs moments matching technique for of mean, dispersion and dropout estimation.
`sigma`	The variability band width for mean-dispersion loess fit defining the prediction interval for read cound simulation. Default is 1.96, i.e. 95% interval. For more information see `loess.sd`.

List with the following vectors

`seqDepth`	Library size, i.e. total number of reads per library
`means`	Mean normalized read counts per gene.
`dispersion`	Dispersion estimate per gene.
`common.dispersion`	The common dispersion estimate over all genes.
`size`	Size parameter of the negative binomial distribution, i.e. 1/dispersion.
`p0`	Probability that the count will be zero per gene.
`meansizefit`	A loess fit relating log2 mean to log2 size for use in simulating new data (`loess.sd`).
`meandispfit`	A fit relating log2 mean to log2 dispersion used for visualizing mean-variance dependency (`loess.sd`).
`p0.cut`	The knee point of meanp0fit. Log2 mean values above that value have virtually no dropouts.
`grand.dropout`	The percentage of empty entries in the count matrix.
`sf`	The estimated library size factor per sample.
`totalS,totalG`	Number of samples and genes provided.
`estS,estG`	Number of samples and genes for which parameters can be estimated.
`RNAseq`	The type of RNAseq: bulk or single cell.
`estFramework`	The estimation framework for NB parameters.
`sigma`	The width of the variability band.

Beate

## Not run: 
## simulating single cell RNA-seq experiment
ngenes <- 10000
ncells <- 100
true.means <- 2^runif(ngenes, 3, 6)
true.dispersions <- 3 + 100/true.means
sf.values <- 2^rnorm(ncells, sd=0.5)
sf.means <- outer(true.means, sf.values, '*')
cnts <- matrix(rnbinom(ngenes*ncells,
mu=sf.means, size=1/true.dispersions),
ncol=ncells)
## estimating negative binomial parameters
estparam <- estimateNBParam(cnts, RNAseq = 'singlecell',
estFramework = 'MatchMoments', sigma=1.96)
plotNBParam(estparam)

## simulating bulk RNA-seq experiment
ngenes <- 10000
nsamples <- 10
true.means <- 2^rnorm(ngenes, mean=8, sd=2)
true.dispersions <- rgamma(ngenes, 2, 6)
sf.values <- rnorm(nsamples, mean=1, sd=0.1)
sf.means <- outer(true.means, sf.values, '*')
cnts <- matrix(rnbinom(ngenes*nsamples,
mu=sf.means, size=1/true.dispersions),
ncol=nsamples)
## estimating negative binomial parameters
estparam <- estimateNBParam(cnts, RNAseq = 'bulk',
estFramework = 'MatchMoments', sigma=1.96)
plotNBParam(estparam)

## End(Not run)