MV_q: MV q-values and confidence intervals

View source: R/MV_q.R

MV_qR Documentation

MV q-values and confidence intervals

Description

q-values with confidence intervals are generated, based in the Millstein and Volfson (MV) estimators.

Usage

MV_q(obsp, permp, pnm, ntests, cl = 0.95, c1 = NA)

Arguments

obsp

observed vector of p-values.

permp

list of dataframes that include a column of permutation p-values (or statistics) in each. The length of the list permp = number of permutations.

pnm

name of column in each list component dataframe that includes p-values (or statistics).

ntests

total number of observed tests, which is usually the same as the length of obsp and the number of rows in each permp dataframe. However, this may not be the case if results were filtered by a p-value threshold or statistic threshold. If filtering was conducted then thres must be smaller (more extreme) than the filtering criterion.

cl

confidence level (default is .95).

c1

overdispersion parameter. If this parameter is not specified (default initial value is NA), then the parameter is estimated from the data. If all tests are known to be independent, then this parameter should be set to 1.

Details

Millstein and Volfson (2013) FDR is based on the idea that FDR is estimated at a level specified by the investigator. Storey and Tibshirani (2003) developed the q-value concept, where FDR is estimated at each observed p-value. However, Millstein and Volfson argued that in order to be informative, uncertainty in the estimate should be quantified, thus the development of confidence intervals for FDR. The MV FDR estimator is less conservative than the BH estimator.

Value

A dataframe which includes:

q

q-value corresponding to the respective p-value

q.ll

q-value lower limit

q.ul

q-value upper limit

Author(s)

Joshua Millstein

References

Millstein J, Volfson D. 2013. Computationally efficient permutation-based confidence interval estimation for tail-area FDR. Frontiers in Genetics | Statistical Genetics and Methodology 4(179):1-11.

Storey, John D., and Robert Tibshirani. "Statistical significance for genomewide studies." Proceedings of the National Academy of Sciences 100.16 (2003): 9440-9445.

Examples

ss=100
nvar=100
X = as.data.frame(matrix(rnorm(ss*nvar),nrow=ss,ncol=nvar))
e = as.data.frame(matrix(rnorm(ss*nvar),nrow=ss,ncol=nvar))
Y = .1*X + e
nperm = 10

myanalysis = function(X,Y){
	ntests = ncol(X)
	rslts = as.data.frame(matrix(NA,nrow=ntests,ncol=2))
	names(rslts) = c("ID","pvalue")
	rslts[,"ID"] = 1:ntests
	for(i in 1:ntests){
		fit = cor.test(X[,i],Y[,i],na.action="na.exclude",
			alternative="two.sided",method="pearson")
		rslts[i,"pvalue"] = fit$p.value
	}
	return(rslts)
} # End myanalysis

# Generate observed results
obs = myanalysis(X,Y)

# Generate permuted results
perml = vector('list',nperm)
for(p_ in 1:nperm){
	X1 = X[order(runif(nvar)),]
	perml[[p_]] = myanalysis(X1,Y)
}

q.values.MV = MV_q(obs$pvalue,perml,"pvalue",nvar)


USCbiostats/fdrci documentation built on Oct. 22, 2022, 11:44 p.m.