We developed informed GWAS (iGWAS) to apply p-value weighting to pairs of genome-wide association studies. For two related phenotypes, iGWAS uses prior knowledge from one GWAS to weight the P values from another. This strategy may lead to the discovery of new loci. The Pweight package function iGWAS performs this analysis. iGWAS can use any weighting scheme provided in Pweight, including Bayes, Spjotvoll, and exponential weights.

In this vignette, we will demonstrate how to run a simple iGWAS analysis on real data, using two independent, publicly available GWAS meta-analyses of coronary artery disease available from the CARDIoGRAMplusC4D Consortium (http://www.cardiogramplusc4d.org/downloads/).

These two studies were originally described in:

Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 2011 43: 333-338 ["CARDIoGRAM_GWAS_RESULTS.txt"]

Coronary Artery Disease (C4D) Genetics Consortium (Writing Committee: Peden JF, Hopewell JC, Saleheen D, Chambers JC, Hager J, Soranzo N, Collins R, Danesh J, Elliott P, Farrall M, Stirrups K, Zhang W, Hamsten A, Parish S, Lathrop M, Watkins H (Chair), Clarke R, Deloukas P, Kooner J). A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat Genet. 2011 43: 339-344 ["C4D_CAD_DISCOVERY_METAANALYSIS_UPDATE.TXT"]

As input, the function \code{iGWAS} requires the p-values and sample sizes of the two GWAS studies.

First, we load in the example data which consists of p-values for all 514,178 SNPs tested in both the C4D and CARDIoGRAM association analyses, as well as the number of cases and controls in each study. The input P value vectors should be of the same size, and in the same order.

The example data file is ~5MB in size and can be downloaded directly from github into R:

#uncomment on Windows #setInternet2(use = TRUE) download.file("http://github.com/dobriban/pweight/raw/master/vignettes/iGWAS.cad.example.RData", destfile = "exampledata", method = "libcurl") load("exampledata")

We calculate the effective sample size of each study as:

c4d_ss <- 4/(1/c4d_ncases + 1/c4d_ncontrols) cardiogram_ss <- 4/(1/cardiogram_ncases + 1/cardiogram_ncontrols)

We are now ready to run iGWAS.

The simplest way to run iGWAS is using default settings. The default setting uses Bayes weights. Each of the weighting functions, and the iGWAS routine, make assumptions about the input data; these are explored more fully in the "Pweight Basics" vignette.

As an example, we use the larger CARDIoGRAM study (# cases = 22233, # controls = 64762) to weight the C4D data (# cases = 15420, # controls = 15062):

source("../R/iGWAS.R") source("../R/bayes_weights.R") res_default <- iGWAS(P_current=cad$P_c4d, N_current=c4d_ss, P_prior=cad$P_cardiogram, N_prior=cardiogram_ss)

40 SNPs are genome-wide significant (P < 5e-8) after weighting. We can check whether weighting increases the number of genome-wide hits, versus using the original data with no prior knowledge:

cat(c("Number of genome-wide significant SNPs (P < 5e-8) with no weighting: ", sum(cad$P_c4d < 5e-8) ))

In this example, p-value weighting increases the number of genome-wide significant SNPs from 29 to 40. Further analyses would be needed to determine whether the 11 additional SNPs are in linkage disequilibrium with the 29, or whether they represent new and independent loci.

We can inspect the correspondence between weights and p-values from the prior-knowledge CARDIoGRAM study:

w = res_default$w plot(-log10(cad$P_cardiogram), w, main="How Bayes weights depend on p-values", ylab=expression(italic(w)), xlab=expression(-log[10](italic(p)))) cat(c("Maximum weight reached at P-value of ", cad$P_cardiogram[which.max(w)] ))

Weight increases with decreasing P value until it reaches a maximum, then decreases. Lower weight is assigned to SNPs with P values below 1.12e-09, as these SNPs are expected to have a very low P values in the current C4D study, and not require weighting to attain significance.

For the common task of visualizing of significant P values, before and after weighting, there is a built-in option `figure="T"`

in the iGWAS function. This also requires us to pass in the GWAS data frame with the parameter `GWAS_data_frame=cad`

.

The iGWAS function calls the qqman package to generate the Manhattan plot (qqman must be installed). This package is described in: Turner, S.D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165 (2014).

Note: Drawing the manhattan plot may take a few minutes, depending on the number of SNPs being plotted.

res_default <- iGWAS(P_current=cad$P_c4d, N_current=c4d_ss, P_prior=cad$P_cardiogram, N_prior=cardiogram_ss,figure="T", GWAS_data_frame=cad)

Here, unweighted P values are shown in black and weighted P values in green. The red line corresponds to the threshold for genome-wide significance.

From this plot, it is visually apparent that weighting has boosted new loci to genome-wide significance, e.g. on chromosome 9.

The dispersion factor phi affects the degree of regularization for Bayes weights. Setting phi = 0 corresponds to using Spjotvoll weights. Here, we test how different values of phi affect the number of significant loci discovered.

source("../R/spjotvoll_weights.R") phi=seq(0,1,by=0.1) numSNP <- sapply(phi,function(x){sum(iGWAS(P_current=cad$P_c4d, N_current=c4d_ss, P_prior=cad$P_cardiogram, N_prior=cardiogram_ss, phi=x)$sig_ind)}) plot(phi, numSNP, type="o", main="Choice of phi impacts the number of significant SNPs", ylab="# SNPs", xlab=expression(phi)) cat(c("Choice of phi that maximizes the number of genome-wide significant SNPs:", phi[which.max(numSNP)] ))

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.