Sample Size Calculations for Two-Sample RNA-seq Experiments with Differing Mean and Dispersion Among Genes

Share:

Description

This function calculates appropriate sample sizes for two-sample RNA-seq experiments for a desired power in which mean and dispersion vary among genes. Sample size calculations are performed at controlled false discovery rates, user-specified proportions of non-differentially expressed genes, mean counts in control group, dispersion, and fold change. A plot of power versus sample size is generated.

Usage

1
2
3
ssizeRNA_vary(nGenes = 10000, pi0 = 0.8, m = 200, mu, disp, fc,
  up = 0.5, replace = TRUE, fdr = 0.05, power = 0.8, maxN = 35,
  side = "two-sided", cex.title = 1.15, cex.legend = 1)

Arguments

nGenes

total number of genes, the default value is 10000.

pi0

proportion of non-differentially expressed genes, the default value is 0.8.

m

pseudo sample size for generated data.

mu

a vector (or scalar) of mean counts in control group from which to simulate.

disp

a vector (or scalar) of dispersion parameter from which to simulate.

fc

a vector (or scalar, or a function that takes an integer n and generates a vector of length n) of fold change for differentially expressed (DE) genes.

up

proportion of up-regulated genes among all DE genes, the default value is 0.5.

replace

sample with or without replacement from given parameters. See Details for more information.

fdr

the false discovery rate to be controlled.

power

the desired power to be achieved.

maxN

the maximum sample size used for power calculations.

side

options are "two-sided", "upper", or "lower".

cex.title

controls size of chart titles.

cex.legend

controls size of chart legend.

Details

If a vector is input for pi0, sample size calculations are performed for each proportion.

If the total number of genes is larger than length of mu or disp, replace always equals TRUE.

Value

ssize

sample sizes (for each treatment) at which desired power is first reached.

power

power calculations with corresponding sample sizes.

crit.vals

critical value calculations with corresponding sample sizes.

Author(s)

Ran Bi biran@iastate.edu, Peng Liu pliu@iastate.edu

References

Liu, P. and Hwang, J. T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.

Orr, M. and Liu, P. (2009) Sample size estimation while controlling false discovery rate for microarray experiments using ssize.fdr package. The R Journal, 1, 1, May 2009, 47-53.

Law, C. W., Chen, Y., Shi, W., Smyth, G. K. (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.

See Also

ssizeRNA_single

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
library(edgeR)
library(Biobase)
data(hammer.eset)
## load hammer dataset (Hammer, P. et al., 2010)

counts <- exprs(hammer.eset)[, phenoData(hammer.eset)$Time == "2 weeks"]
counts <- counts[rowSums(counts) > 0,]
trt <- hammer.eset$protocol[which(hammer.eset$Time == "2 weeks")] 

mu <- apply(counts[, trt == "control"], 1, mean)  
## average read count in control group for each gene

d <- DGEList(counts)
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
disp <- d$tagwise.dispersion      
## dispersion for each gene

## fixed fold change
fc <- 2
size <- ssizeRNA_vary(mu = mu, disp = disp, fc = fc, 
                      m = 30, maxN = 15, replace = FALSE)
size$ssize         ## first sample size to reach desired power
size$power         ## calculated power for each sample size
size$crit.vals     ## calculated critical value for each sample size


## varying fold change
## fc1 <- function(x){exp(rnorm(x, log(2), 0.5*log(2)))}
## size1 <- ssizeRNA_vary(pi0 = 0.8, mu = mu, disp = disp, fc = fc1, 
##                        m = 30, maxN = 20, replace = FALSE)