P-value combination using the inverse normal method

Description

Combines one sided p-values using the inverse normal method.

Usage

1
invnorm(indpval, nrep, BHth = 0.05)

Arguments

indpval

List of vectors of one sided p-values to be combined.

nrep

Vector of numbers of replicates used in each study to calculate the previous one-sided p-values.

BHth

Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%.

Details

For each gene g, let

N_g = ∑_{s=1}^S ω_s Φ^{-1}(1-p_{gs}),

where p_{gs} corresponds to the raw p-value obtained for gene g in a differential analysis for study s (assumed to be uniformly distributed under the null hypothesis), Φ the cumulative distribution function of the standard normal distribution, and ω_s a set of weights. We define the weights ω_s as in Marot and Mayer (2009):

ω_s = √{\frac{∑_c R_{cs}}{∑_\ell ∑_c R_{c\ell}}},

where ∑_c R_{cs} is the total number of biological replicates in study s. This allows studies with large numbers of biological replicates to be attributed a larger weight than smaller studies.

Under the null hypothesis, the test statistic N_g follows a N(0,1) distribution. A unilateral test on the righthand tail of the distribution may then be performed, and classical procedures for the correction of multiple testing, such as that of Benjamini and Hochberg (1995), may subsequently be applied to the obtained p-values to control the false discovery rate at a desired level α.

Value

DEindices

Indices of differentially expressed genes at the chosen Benjamini Hochberg threshold.

TestStatistic

Vector with test statistics for differential expression in the meta-analysis.

rawpval

Vector with raw p-values for differential expression in the meta-analysis.

adjpval

Vector with adjusted p-values for differential expression in the meta-analysis.

Note

This function resembles the function directpvalcombi in the metaMA R package; there is, however, one important difference in the calculation of p-values. In particular, for microarray data, it is typically advised to separate under- and over-expressed genes prior to the meta-analysis. In the case of RNA-seq data, differential analyses from individual studies typically make use of negative binomial models and exact tests, which lead to one-sided, rather than two-sided, p-values. As such, we suggest performing a meta-analysis over the full set of genes, followed by an a posteriori check, and if necessary filter, of genes with conflicting results (over vs. under expression) among studies.

References

Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: a pratical and powerful approach to multiple testing. JRSS B (57): 289-300.

Hedges, L. and Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press.

Marot, G. and Mayer, C.-D. (2009). Sequential analysis for microarray data based on sensitivity and meta-analysis. SAGMB 8(1): 1-33.

A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91

See Also

metaRNASeq

Examples

1
2
3
4
5
6
7
8
data(rawpval)
## 8 replicates simulated in each study
invnormcomb <- invnorm(rawpval,nrep=c(8,8), BHth = 0.05)       
DE <- ifelse(invnormcomb$adjpval<=0.05,1,0)
hist(invnormcomb$rawpval,nclass=100)

## A more detailed example is given in the vignette of the package:
## vignette("metaRNASeq")