dsFDR: Estimating the False Discovery Rate

Description Usage Arguments Value References Examples

View source: R/dsFDR.R

Description

This function uses the permutation plug-in method to estimate the FDR. It requires scaled-test statistics so that permutation are comparable from one test to another.

Usage

1
dsFDR(gene_scores_perm, gene_scores_obs, use_median = TRUE, doPlot = FALSE)

Arguments

gene_scores_perm

a numeric matrix of size G x n_perm containing the permuted gene-wise scores for G genes with n_perm permutations.

gene_scores_obs

a vector of length n containing the observed gene-wise scores.

use_median

a logical flag indicating whether the median should be used to estimate the true proportion of null features. If not, we use a range of quantiles of the permuted gene-wise scores and the true proportion of null features is extrapolated from the limit of a smoothed estimate using the natural cubic spline. Default is TRUE. See Storey et al. for details

doPlot

a logical flag indicating whether the plot of the natural cubic spline fit should be drawn. Default is FALSE. Ignored if use_median is TRUE.

Value

A vector of estimating discrete false discovery rates

References

J. Li and R. Tibshirani (2013). Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-seq data, Statistical Methods in Medical Research, 22(5): 519-536

Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100(16), 9440-9445.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Not run: 
#rm(list=ls())
G <- 1000
nperm <- 500
G1 <- 0.3*G
G0 <- G-G1
gene_scores_perm <- matrix(rchisq(G*nperm, df=1), ncol=nperm, nrow=G)
gene_scores_obs <- c(rchisq(G1, df=1), rchisq(G0, df=1))

qvals <- dsFDR(gene_scores_perm, gene_scores_obs, use_median = FALSE, doPlot = TRUE)
summary(qvals)

qvals <- qvals[!is.na(qvals)]
eFDR_5pct <- sum(qvals[-(1:G1)]<0.05)/sum(qvals < 0.05)
eTDR_5pct <- sum(qvals[1:G1]<0.05)/sum(qvals < 0.05)
cat("FDR:", eFDR_5pct, " TDR:", eTDR_5pct, "\n")
plot(y = sapply(seq(0, 1, by=0.001), function(x){sum(qvals[-(1:G1)] < x)/sum(qvals < x)}),
    x = seq(0, 1, by=0.001),
    type = "l", xlab = "Nominal FDR level", ylab = "Empirical FDR", col = "red", lwd = 2,
    ylim = c(0,1))
abline(a = 0, b = 1, lty = 2)

res <- list()
G <- 1000
nperm <- 1000
G1 <- 0.3*G
G0 <- G-G1
for(i in 1:100){
cat(i, "/100\n", sep="")
gene_scores_obs <- c(rchisq(G1, df=1), rchisq(G0, df=1))
gene_scores_perm <- matrix(rchisq(G*nperm, df=1), ncol=nperm, nrow=G)
qvals <- dsFDR(gene_scores_perm, gene_scores_obs, use_median = TRUE, doPlot = FALSE)
res[[i]] <- sapply(seq(0, 1, by=0.01), function(x){sum(qvals < x)})
}

## End(Not run)

tcgsaseq documentation built on Sept. 13, 2020, 5:13 p.m.