The Pweight R package contains open source implementations of p-value weighting methods for multiple testing, including Spjotvoll, exponential and Bayes weights.

In this vignette, we will walk through an example data analysis with p-value weights. We will also conduct an example of iGWAS on simulated data.

P-value weighting methods

For each p-value weighting method, we assume we observe data $T_i \sim \mathcal{N}(\mu_i, 1)$ and test each null hypothesis $H_i: \mu_i \ge 0$ against $\mu_i <0$. The p-value for testing $H_i$ is $P_i = \Phi(T_i)$, where $\Phi$ is the normal cumulative distribution function. For a weight vector $w \in [0,\infty)^{J}$ and significance level $q \in [0,1]$, the weighted Bonferroni procedure rejects $H_{i}$ if $P_i \le q w_i$. Usual Bonferroni corresponds to $w_i=1$.

Each p-value weighting method assumes some additional independent information about $\mu_i$, and returns a weight vector $w$. These can then be used for weighted Bonferroni or other multiple testing procedures.

Bayes weights

Bayes p-value weights are the flagship method of the Pweight package. They assume that the prior information takes the form of normal distributions.

Bayes p-value weights can be computed using: bayes_weights(eta, sigma, q). The inputs specify the prior distribution of the means $\mu_i$ of the test statistics as:

$$\mu_i \sim \mathcal{N}(\eta_i,\sigma_i^2), \mbox{ } i \in {1,\ldots, J}$$

where:

The outputs are a list consisting of:

Example

We will present an example of p-value weighting with simulated data. We generate two sets of test statistics, the prior and the current data. A small fraction of the prior data holds some information about the current data. However, most prior data is noise. In our experience working with GWAS data, this is a reasonable model for association studies done on two independent samples and two distinct traits (such as cardiovascular disease and aging).

Generating the Data

We do this by drawing from a mixture distribution. We generate a large number $J$ of tests. For each test $i$ we flip a coin $X_i$: If $X_i=1$, then the prior is meaningful, else it is noise. If the prior is meaningful, we draw a random negative $\mu_i$ and both test statistics $(T^{1}_i,T^{2}_i)$ are Gaussians centered at $\mu_i$. Else we draw two independent normal test statistics $(T^{1}_i,T^{2}_i)$. This ensures that this small fraction of the data is correlated. The code is:

set.seed(2)
J <- 500
mu <- - 2*abs(rnorm(J))
frac_sig <- 0.1
X <- rbinom(J, 1, frac_sig)
t1 <- rnorm(J, X*mu,1)
t2 <- rnorm(J, X*mu,1)

The data that this generates shows the desired patter, as can be seen on a scatterplot. Most pairs have no correlation, but there is a small fraction that does.

scatter.smooth(t1,t2)

The p-values for the one-sided tests $\mu_i = 0$ vs $\mu_i <0$ utilizing only the current data are $P_i = \Phi(T^{2}_i)$.

P_current <- pnorm(t2)

Data Analysis: Bonferroni

We want to test the hypotheses $\mu_i = 0$ against $\mu_i <0$ for each $i$ utilizing the current data $T^{2}_i$. The simplest way is by Bonferroni-corrected multiple testing. We choose an uncorrected significance level $\alpha=0.05$ and call the bonferroni(...) function on the p-values.

alpha <- 0.05
P_adjusted <- p.adjust(P_current,"bonferroni")
ind <- (P_adjusted<alpha)
cat(c("Number of significant tests using Bonferroni: ", sum(ind) ))

Bonferroni leads to 4 significant test statistics.

Data Analysis: P-value Weighting

Alternatively, one can do p-value weighting. For this, we use $T^1$ as prior information. We set the prior standard errors $\sigma=1$ in this example. More detailed discussion on the choice of $\sigma$ can be found in [@dobriban2015optimal]. As explained earlier, we set $q = 0.05/J$. Then we compute the weights. Finally, we run weighted Bonferroni on the weighted p-values $P'_i = P_i/w_i$. This is the code that accomplishes it:

q <- alpha/J
sigma <- rep(1,J)
source("../R/bayes_weights.R")
res <- bayes_weights(t1,sigma,q)
w <- res$w
P_weighted <- P_current/w
P_w_adjusted <- p.adjust(P_weighted,"bonferroni")
ind_w <- (P_w_adjusted<alpha)
cat(c("Number of significant tests using Weighting: ", sum(ind_w) ))

Hence, in this example weighting increases the number of significant hits from 4 to 5.

Post-Analysis

One can get some insight into the procedure by examining which hypotheses were declared significant by the two methods. The following code reveals which test were significant:

which(ind==1)
which(ind_w==1)

In this particular case weighting leads to a strict increase in power, selecting an addional hypothesis. Taking 266 as an example we see that its P-value and z-score in the current data equals

x = 266
P_current[x]
t2[x]

This is not enough for it to be significant since the threshold is $0.05/J=0.0001$. However, it gets assigned a large weight, because its prior effect is large. So it's selected after weighting.

w[x]
t1[x]

Another insight can be gained from plotting the weights as a function of the prior mean. We see that the weights are non-monotonic as a function of the prior mean. Indeed they place a large mass on the middle means between (-6 and -2).

plot(t1,w)

iGWAS

Now we perform an informed Genome-Wide Association Study (or iGWAS) on this data set, using the provided \code{iGWAS} function. For this we need to provide P-values and sample sizes for both the current and the prior studies.

The p-values are obtained from \code{t1} just like for \code{t1} previously.

P_prior <- pnorm(t1)

This simulation did not use any specific sample sizes, so we are simply going to assume that the sample sizes are as follows:

N_current <- 1
N_prior <- 1

Now we can run iGWAS. This method computes the p-value weights based on the prior p-values, and uses them in multiple testing (p-value weighting) for the current p-values. The p-value weighting method (e.g. Unweighted, Bayes) and the multiple testing adjustment (e.g. Bonferroni, Benjamini-Hochberg) can be specified independently. However, we will use the basic settings, which correspond to Bayes weights with Bonferroni testing.

First we perform unweighted testing.

source("../R/iGWAS.R")
res_unw <- iGWAS(P_current, N_current, P_prior, N_prior, weighting_method="unweighted", p_adjust_method = "bonferroni")

As expected, we get the same results as before. Now, we perform weighted testing. To indicate that the prior p-values are one-sided, we include the 'sides=1' argument.

res_w <- iGWAS(P_current, N_current, P_prior, N_prior, sides=1,  p_adjust_method = "bonferroni")

Again, p-value weighting leads to improved power, and a larger number of significant hits.

References



dobriban/pweight documentation built on May 15, 2019, 9:42 a.m.