Simulating RNA-seq read counts

Description

This function takes a fitted DESeq2 data object as an input and returns a simulated data object with the same sample size factors, total counts and dispersions for each gene as in real data, but without the effect of predictor variables.

Usage

1
simulateCounts(deseq.object)

Arguments

deseq.object

DESeq2 data object, with estimated size factors and dispersions (output of DESeq() function).

Details

For each gene, the total counts are randomly resampled into different samples. The estimated per-gene dispersions are understood as the suare of the coefficient of variation and used to simulate random deviations in per-sample assignment probability. The probabilities of per-sample assignment are also weighted by the empirical sample size factors.

Value

DESeq2 data object

Author(s)

Mikhail V. Matz

References

R. M. Wright, G. V. Aglyamova, E. Meyer and M. V. Matz (2015) Local and systemic gene expression responses to a white-syndrome-like disease in a reef-bulding coral, Acropora hyacinthus.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
dds = makeExampleDESeqDataSet(betaSD=1, n=100)
dds = DESeq(dds)
sims = simulateCounts(dds)
sims = DESeq(sims)
res = results(dds)
sim.res=results(sims)

# how similar is the simulation to real data? 
plot(sizeFactors(sims)~sizeFactors(dds))
plot(log(dispersions(sims),10)~log(dispersions(dds),10))

# computing and plotting empirical FDR
fdrt = fdrTable(res$pvalue,sim.res$pvalue)
fdrBiCurve(fdrt,maxLogP=4,main="DEG discovery rates")
efdr = empiricalFDR(fdrt,plot=TRUE,main="False discovery rate")

# how many genes pass empirical 0.1 FDR cutoff?
table(res$pvalue<efdr)

# how many genes pass multiplicity-corrected 0.1 FDR cutoff?
table(res$padj<0.1)