fdrTbl: FDR Estimate and Confidence Interval Sequence Table

View source: R/fdrTbl.R

fdrTblR Documentation

FDR Estimate and Confidence Interval Sequence Table

Description

Computes FDR estimates and confidence intervals for a sequence of potential significance thresholds.

Usage

fdrTbl(
  obs.vec,
  perm.list = NULL,
  pname,
  ntests,
  lowerbound,
  upperbound,
  incr = 0.1,
  cl = 0.95,
  c1 = NA,
  correct = "none",
  meff = TRUE,
  seff = TRUE,
  mymat,
  nperms = 5
)

Arguments

obs.vec

observed vector of p-values.

perm.list

list of dataframes that include a column of permutation p-values (or statistics) in each. The length of the list permp = number of permutations.

pname

name of column in each list component dataframe that includes p-values (or statistics).

ntests

total number of observed tests, which is usually the same as the length of obs.vec and the number of rows in each perm.list dataframe. However, this may not be the case if results were filtered by a p-value threshold or statistic threshold. If filtering was conducted then lowerbound must be greater (more extreme) than the filtering criterion.

lowerbound

lowerbound refers to the range of -log10(p-value) over which fdr is computed for a sequence of thresholds

upperbound

upperbound refers to the range of -log10(p-value) over which fdr is computed for a sequence of thresholds

incr

value by which to increment the sequence from lowerbound to upperbound on a -log10(p-value) scale. Default is 0.1.

cl

confidence level (default is .95).

c1

overdispersion parameter to account for dependencies among tests. If all tests are known to be independent, then this parameter should be set to 1.

correct

"none", "BH", should confidence intervals be corrected for multiplicity using a modification of the Benjamini and Yekutieli (2005) approach for selecting and correcting intervals? (default is "none")

meff

(For parametric estimation, if perm.list = NULL.) Logical. To be passed into fdr_od. TRUE implies the calculation of the effective number of tests based on the JM estimator (Default is TRUE)

seff

(For parametric estimation, if perm.list = NULL.) Logical. To be passed into fdr_od. TRUE implies the calculation of the effective number of rejected hypotheses based on the JM estimator (Default is TRUE)

mymat

(For parametric estimation, if perm.list = NULL.) Matrix. To be passed into fdr_od. Design matrix used to calculate the p-values provided in obsp.

nperms

(For parametric estimation, if perm.list = NULL.) Integer. To be passed into fdr_od. Number of permutations needed to estimate the effective number of (rejected) tests. (Must be non-zero, default is 5)

Details

fdrTbl calls fdr_od for a series of discovery thresholds. Output from fdrTbl() can be used for FDRplot() input.

If correct = "BH", then confidence intervals will be corrected according to the thresholds specified by lowerbound, upperbound, and incr. Thresholds will be selected if FDR is determined to be significantly different than 1. First a Z-score test is conducted using the Millstein & Volfson standard error estimate. Then BH FDR is computed according to the Benjamini and Yekutieli (2005) approach. CIs for selected thresholds will be adjusted to account for multiple CI estimation. For thresholds that are not selected, NA values are returned.

Value

A dataframe is returned where rows correspond to p-value thresholds in the sequence from lowerbound to upperbound and columns are:

If permutation: c("threshold","fdr","ll","ul","pi0","odp","S","Sp")

threshold

p-value threshold chosen to define positive tests

fdr

estimated FDR at the chosen p-value threshold

ll

estimated lower 95% confidence bound for the FDR estimate

ul

estimated upper 95% confidence bound for the FDR estimate

pi0

estimated percent of true null hypotheses

odp

estimated over-dispersion parameter

S

observed number of positive tests

Sp

total number of positive tests summed across all permuted result sets

If parametric: c("threshold","fdr","ll","ul","M","M.eff","S","S.eff")

threshold

p-value threshold chosen to define positive tests

fdr

estimated FDR at the chosen p-value threshold

ll

estimated lower 95% confidence bound for the FDR estimate

ul

estimated upper 95% confidence bound for the FDR estimate

M

total number of tests

M.eff

Effective number of tests via the JM estimator

S

observed number of positive tests

S.eff

effective number of positive tests via the JM estimator

Author(s)

Joshua Millstein, Eric S. Kawaguchi

References

Millstein J, Volfson D. 2013. Computationally efficient permutation-based confidence interval estimation for tail-area FDR. Frontiers in Genetics | Statistical Genetics and Methodology 4(179):1-11.

Benjamini, Yoav, and Daniel Yekutieli. "False discovery rate adjusted multiple confidence intervals for selected parameters." Journal of the American Statistical Association 100.469 (2005): 71-81.

Examples



n.row=100
n.col=100
X = as.data.frame(matrix(rnorm(n.row*n.col),nrow=n.row,ncol=n.col))
e = as.data.frame(matrix(rnorm(n.row*n.col),nrow=n.row,ncol=n.col))
Y = .1*X + e
nperm = 10

myanalysis = function(X,Y){
	ntests = ncol(X)
	rslts = as.data.frame(matrix(NA,nrow=ntests,ncol=2))
	names(rslts) = c("ID","pvalue")
	rslts[,"ID"] = 1:ntests
	for(i in 1:ntests){
		fit = cor.test(X[,i],Y[,i],na.action="na.exclude",
			alternative="two.sided",method="pearson")
		rslts[i,"pvalue"] = fit$p.value
	}
	return(rslts)
} # End myanalysis

## Generate observed results
obs = myanalysis(X,Y)

## Generate permuted results
perml = vector('list',nperm)
for(perm in 1:nperm){
	X1 = X[order(runif(n.col)),]
	perml[[perm]] = myanalysis(X1,Y)
}

## FDR results table
fdrTbl(obs$pvalue,perml,"pvalue",n.col,1,2)
fdrTbl(obs$pvalue,perml,"pvalue",n.col,1,2,correct="BH")

## FDR results table (parametric)
fdrTbl(obs$pvalue, NULL, "pvalue",n.col,1,2,meff = TRUE, seff = TRUE, mymat = X, nperms = 5)


USCbiostats/fdrci documentation built on Oct. 22, 2022, 11:44 p.m.