dsFDR: Estimating the False Discovery Rate
In tcgsaseq: Time-Course Gene Set Analysis for RNA-Seq Data

Description Usage Arguments Value References Examples

This function uses the permutation plug-in method to estimate the FDR. It requires scaled-test statistics so that permutation are comparable from one test to another.

1	dsFDR(gene_scores_perm, gene_scores_obs, use_median = TRUE, doPlot = FALSE)

`gene_scores_perm`	a numeric matrix of size `G x n_perm` containing the permuted gene-wise scores for `G` genes with `n_perm` permutations.
`gene_scores_obs`	a vector of length `n` containing the observed gene-wise scores.
`use_median`	a logical flag indicating whether the median should be used to estimate the true proportion of null features. If not, we use a range of quantiles of the permuted gene-wise scores and the true proportion of null features is extrapolated from the limit of a smoothed estimate using the natural cubic spline. Default is `TRUE`. See Storey et al. for details
`doPlot`	a logical flag indicating whether the plot of the natural cubic spline fit should be drawn. Default is `FALSE`. Ignored if `use_median` is `TRUE`.

A vector of estimating discrete false discovery rates

J. Li and R. Tibshirani (2013). Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-seq data, Statistical Methods in Medical Research, 22(5): 519-536

Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100(16), 9440-9445.

## Not run: 
#rm(list=ls())
G <- 1000
nperm <- 500
G1 <- 0.3*G
G0 <- G-G1
gene_scores_perm <- matrix(rchisq(G*nperm, df=1), ncol=nperm, nrow=G)
gene_scores_obs <- c(rchisq(G1, df=1), rchisq(G0, df=1))

qvals <- dsFDR(gene_scores_perm, gene_scores_obs, use_median = FALSE, doPlot = TRUE)
summary(qvals)

qvals <- qvals[!is.na(qvals)]
eFDR_5pct <- sum(qvals[-(1:G1)]<0.05)/sum(qvals < 0.05)
eTDR_5pct <- sum(qvals[1:G1]<0.05)/sum(qvals < 0.05)
cat("FDR:", eFDR_5pct, " TDR:", eTDR_5pct, "\n")
plot(y = sapply(seq(0, 1, by=0.001), function(x){sum(qvals[-(1:G1)] < x)/sum(qvals < x)}),
    x = seq(0, 1, by=0.001),
    type = "l", xlab = "Nominal FDR level", ylab = "Empirical FDR", col = "red", lwd = 2,
    ylim = c(0,1))
abline(a = 0, b = 1, lty = 2)

res <- list()
G <- 1000
nperm <- 1000
G1 <- 0.3*G
G0 <- G-G1
for(i in 1:100){
cat(i, "/100\n", sep="")
gene_scores_obs <- c(rchisq(G1, df=1), rchisq(G0, df=1))
gene_scores_perm <- matrix(rchisq(G*nperm, df=1), ncol=nperm, nrow=G)
qvals <- dsFDR(gene_scores_perm, gene_scores_obs, use_median = TRUE, doPlot = FALSE)
res[[i]] <- sapply(seq(0, 1, by=0.01), function(x){sum(qvals < x)})
}

## End(Not run)