knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

To use the IEU GWAS database for MR analysis, see the TwoSampleMR R package.

Here we'll demonstrate how to achieve the same data extractions using the GWAS VCF files.

We'll use the example of LDL cholesterol ieu-a-300 and coronary heart disease ieu-a-7.

Load libraries:

suppressPackageStartupMessages(suppressWarnings({
    library(TwoSampleMR)
    library(gwasglue)
    library(gwasvcf)
    library(ieugwasr)
    library(dplyr)
}))

Using TwoSampleMR

This is a simple procedure for MR using the TwoSampleMR package:

# Extract the instruments for LDL
expd1 <- TwoSampleMR::extract_instruments("ieu-a-300")

# Extract those SNP effects for CHD
outd1 <- TwoSampleMR::extract_outcome_data(expd1$SNP, "ieu-a-7", proxies=FALSE)

# Harmonise the exposure and outcome data
dat1 <- TwoSampleMR::harmonise_data(expd1, outd1)

# Perform MR
TwoSampleMR::mr(dat1)

Note that this extraction process can be simplified with:

dat1 <- make_dat("ieu-a-300", "ieu-a-7")

Using GWAS VCF files

Let's do the same with the vcf files (and the indexes). Download from here:

wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz
wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz.tbi
wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz
wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz.tbi

First get the tophits for LDL cholesterol

gwasvcf::set_bcftools()
expd2 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", chrompos=paste0(expd1$chr.exposure, ":", expd1$pos.exposure))

Convert to TwoSampleMR format:

expd2 <- gwasglue::gwasvcf_to_TwoSampleMR(expd2, type="exposure")

Extract those SNPs from the outcome vcf file and convert to TwoSampleMR format

outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", chrompos = paste0(expd1$chr.exposure, ":", expd1$pos.exposure))
outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome")

Proceed with harmonising and performing MR

dat2 <- TwoSampleMR::harmonise_data(expd2, outd2)
TwoSampleMR::mr(dat2)

Other options

Clumping vcf files

If we want to extract top hits based on a threshold and clump locally. First download the LD reference dataset:

wget http://fileserve.mrcieu.ac.uk/ld/data_maf0.01_rs_ref.tgz
tar xzvf data_maf0.01_rs_ref.tgz

Now extract the top hits based on a p-value threshold

gwasvcf::set_bcftools()
expd3 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", pval=5e-8)
expd3

Convert to TwoSampleMR format:

expd3 <- gwasglue::gwasvcf_to_TwoSampleMR(expd3, type="exposure")

Get a list of SNPs to retain after clumping and subset the data

retain_snps <- expd3 %>% dplyr::select(rsid=SNP, pval=pval.exposure) %>%
    ieugwasr::ld_clump(., plink_bin=genetics.binaRies::get_plink_binary(), bfile="data_maf0.01_rs_ref") %>%
    {.$rsid}
expd3 <- subset(expd3, SNP %in% retain_snps)

Extracting outcome data with LD proxies

This only works if you extract on rsids at the moment, and is quite slow. But here it is:

gwasvcf::set_plink()
outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", rsid = expd3$SNP, proxies="yes", bfile="data_maf0.01_rs_ref")
outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome")

Further MR methods

A number of other MR methods are available. The current framework for using them is to:

  1. Convert your data to TwoSampleMR format
  2. Use the TwoSampleMR package to convert to other formats

Examples of formats that you can convert to within TwoSampleMR:

Bluecrystal4 users

All data in OpenGWAS is stored on bc4 in the form of GWAS VCF files. You can create harmonised datasets easily on bc4 with these files.

Determine the locations of the GWAS VCF files and a number of other reference datasets and binaries:

set_bcftools()
set_plink()
set_bc4_files()

Now simply run:

dat <- make_TwoSampleMR_dat("ieu-a-300", "ieu-a-7")

This can be run in parallel for large combinations of exposures and outcomes, e.g.:

dat <- make_TwoSampleMR_dat(
    id1=c("ieu-a-300", "ieu-a-302", "ieu-a-299"), 
    id2=c("ieu-a-7", "ieu-a-2"),
    nthreads=6
)

This will lookup all instruments in the exposures (id1) in both exposures and outcomes, and harmonise all exposure-outcome pairs, parallelised across 6 threads.

Note: please make sure to run these analyses in batch mode by submitting to the slurm scheduler, not interactively on the head nodes!



MRCIEU/gwasglue documentation built on Feb. 15, 2022, 3:25 p.m.