Pweight Basics"

The Pweight R package contains open source implementations of p-value weighting methods for multiple testing, including Spjotvoll, exponential and Bayes weights.

In this vignette, we will walk through an example data analysis with p-value weights. We will also conduct an example of iGWAS on simulated data.

P-value weighting methods

For each p-value weighting method, we assume we observe data $T_i \sim \mathcal{N}(\mu_i, 1)$ and test each null hypothesis $H_i: \mu_i \ge 0$ against $\mu_i <0$. The p-value for testing $H_i$ is $P_i = \Phi(T_i)$, where $\Phi$ is the normal cumulative distribution function. For a weight vector $w \in [0,\infty)^{J}$ and significance level $q \in [0,1]$, the weighted Bonferroni procedure rejects $H_{i}$ if $P_i \le q w_i$. Usual Bonferroni corresponds to $w_i=1$.

Each p-value weighting method assumes some additional independent information about $\mu_i$, and returns a weight vector $w$. These can then be used for weighted Bonferroni or other multiple testing procedures.

Bayes weights

Bayes p-value weights are the flagship method of the Pweight package. They assume that the prior information takes the form of normal distributions.

Bayes p-value weights can be computed using: bayes_weights(eta, sigma, q). The inputs specify the prior distribution of the means $\mu_i$ of the test statistics as:

$$\mu_i \sim \mathcal{N}(\eta_i,\sigma_i^2), \mbox{ } i \in {1,\ldots, J}$$

where:

The outputs are a list consisting of:

Example

We will present an example of p-value weighting with simulated data. We generate two sets of test statistics, the prior and the current data. A small fraction of the prior data holds some information about the current data. However, most prior data is noise. In our experience working with GWAS data, this is a reasonable model for association studies done on two independent samples and two distinct traits (such as cardiovascular disease and aging).

Generating the Data

We do this by drawing from a mixture distribution. We generate a large number $J$ of tests. For each test $i$ we flip a coin $X_i$: If $X_i=1$, then the prior is meaningful, else it is noise. If the prior is meaningful, we draw a random negative $\mu_i$ and both test statistics $(T^{1}_i,T^{2}_i)$ are Gaussians centered at $\mu_i$. Else we draw two independent normal test statistics $(T^{1}_i,T^{2}_i)$. This ensures that this small fraction of the data is correlated. The code is:

set.seed(2)
J <- 500
mu <- - 2*abs(rnorm(J))
frac_sig <- 0.1
X <- rbinom(J, 1, frac_sig)
t1 <- rnorm(J, X*mu,1)
t2 <- rnorm(J, X*mu,1)

The data that this generates shows the desired patter, as can be seen on a scatterplot. Most pairs have no correlation, but there is a small fraction that does.

scatter.smooth(t1,t2)

The p-values for the one-sided tests $\mu_i = 0$ vs $\mu_i <0$ utilizing only the current data are $P_i = \Phi(T^{2}_i)$.

P_current <- pnorm(t2)

Data Analysis: Bonferroni

We want to test the hypotheses $\mu_i = 0$ against $\mu_i <0$ for each $i$ utilizing the current data $T^{2}_i$. The simplest way is by Bonferroni-corrected multiple testing. We choose an uncorrected significance level $\alpha=0.05$ and call the bonferroni(...) function on the p-values.

alpha <- 0.05
P_adjusted <- p.adjust(P_current,"bonferroni")
ind <- (P_adjusted<alpha)
cat(c("Number of significant tests using Bonferroni: ", sum(ind) ))

Bonferroni leads to 4 significant test statistics.

Data Analysis: P-value Weighting

Alternatively, one can do p-value weighting. For this, we use $T^1$ as prior information. We set the prior standard errors $\sigma=1$ in this example. More detailed discussion on the choice of $\sigma$ can be found in [@dobriban2015optimal]. As explained earlier, we set $q = 0.05/J$. Then we compute the weights. Finally, we run weighted Bonferroni on the weighted p-values $P'_i = P_i/w_i$. This is the code that accomplishes it:

q <- alpha/J
sigma <- rep(1,J)
source("../R/bayes_weights.R")
res <- bayes_weights(t1,sigma,q)
w <- res$w
P_weighted <- P_current/w
P_w_adjusted <- p.adjust(P_weighted,"bonferroni")
ind_w <- (P_w_adjusted<alpha)
cat(c("Number of significant tests using Weighting: ", sum(ind_w) ))

Hence, in this example weighting increases the number of significant hits from 4 to 5.

Post-Analysis

One can get some insight into the procedure by examining which hypotheses were declared significant by the two methods. The following code reveals which test were significant:

which(ind==1)
which(ind_w==1)

In this particular case weighting leads to a strict increase in power, selecting an addional hypothesis. Taking 266 as an example we see that its P-value and z-score in the current data equals

x = 266
P_current[x]
t2[x]

This is not enough for it to be significant since the threshold is $0.05/J=0.0001$. However, it gets assigned a large weight, because its prior effect is large. So it's selected after weighting.

w[x]
t1[x]

Another insight can be gained from plotting the weights as a function of the prior mean. We see that the weights are non-monotonic as a function of the prior mean. Indeed they place a large mass on the middle means between (-6 and -2).

plot(t1,w)

iGWAS

Now we perform an informed Genome-Wide Association Study (or iGWAS) on this data set, using the provided \code{iGWAS} function. For this we need to provide P-values and sample sizes for both the current and the prior studies.

The p-values are obtained from \code{t1} just like for \code{t1} previously.

P_prior <- pnorm(t1)

This simulation did not use any specific sample sizes, so we are simply going to assume that the sample sizes are as follows:

N_current <- 1
N_prior <- 1

Now we can run iGWAS. This method computes the p-value weights based on the prior p-values, and uses them in multiple testing (p-value weighting) for the current p-values. The p-value weighting method (e.g. Unweighted, Bayes) and the multiple testing adjustment (e.g. Bonferroni, Benjamini-Hochberg) can be specified independently. However, we will use the basic settings, which correspond to Bayes weights with Bonferroni testing.

First we perform unweighted testing.

source("../R/iGWAS.R")
res_unw <- iGWAS(P_current, N_current, P_prior, N_prior, weighting_method="unweighted")

As expected, we get the same results as before. Now, we perform weighted testing. To indicate that the prior p-values are one-sided, we include the 'sides=1' argument.

res_w <- iGWAS(P_current, N_current, P_prior, N_prior, sides=1)

Again, p-value weighting leads to improved power, and a larger number of significant hits.

References



Try the pweight package in your browser

Any scripts or data that you put into this service are public.

pweight documentation built on May 30, 2017, 2:54 a.m.