knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
To use the IEU GWAS database for MR analysis, see the TwoSampleMR R package.
Here we'll demonstrate how to achieve the same data extractions using the GWAS VCF files.
We'll use the example of LDL cholesterol ieu-a-300 and coronary heart disease ieu-a-7.
Load libraries:
suppressPackageStartupMessages(suppressWarnings({ library(TwoSampleMR) library(gwasglue) library(gwasvcf) library(ieugwasr) library(dplyr) }))
This is a simple procedure for MR using the TwoSampleMR package:
# Extract the instruments for LDL expd1 <- TwoSampleMR::extract_instruments("ieu-a-300") # Extract those SNP effects for CHD outd1 <- TwoSampleMR::extract_outcome_data(expd1$SNP, "ieu-a-7", proxies=FALSE) # Harmonise the exposure and outcome data dat1 <- TwoSampleMR::harmonise_data(expd1, outd1) # Perform MR TwoSampleMR::mr(dat1)
Note that this extraction process can be simplified with:
dat1 <- make_dat("ieu-a-300", "ieu-a-7")
Let's do the same with the vcf files (and the indexes). Download from here:
wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz wget https://gwas.mrcieu.ac.uk/files/ieu-a-300/ieu-a-300.vcf.gz.tbi wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz wget https://gwas.mrcieu.ac.uk/files/ieu-a-7/ieu-a-7.vcf.gz.tbi
First get the tophits for LDL cholesterol
gwasvcf::set_bcftools() expd2 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", chrompos=paste0(expd1$chr.exposure, ":", expd1$pos.exposure))
Convert to TwoSampleMR format:
expd2 <- gwasglue::gwasvcf_to_TwoSampleMR(expd2, type="exposure")
Extract those SNPs from the outcome vcf file and convert to TwoSampleMR format
outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", chrompos = paste0(expd1$chr.exposure, ":", expd1$pos.exposure)) outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome")
Proceed with harmonising and performing MR
dat2 <- TwoSampleMR::harmonise_data(expd2, outd2) TwoSampleMR::mr(dat2)
If we want to extract top hits based on a threshold and clump locally. First download the LD reference dataset:
wget http://fileserve.mrcieu.ac.uk/ld/data_maf0.01_rs_ref.tgz tar xzvf data_maf0.01_rs_ref.tgz
Now extract the top hits based on a p-value threshold
gwasvcf::set_bcftools() expd3 <- gwasvcf::query_gwas("ieu-a-300.vcf.gz", pval=5e-8) expd3
Convert to TwoSampleMR format:
expd3 <- gwasglue::gwasvcf_to_TwoSampleMR(expd3, type="exposure")
Get a list of SNPs to retain after clumping and subset the data
retain_snps <- expd3 %>% dplyr::select(rsid=SNP, pval=pval.exposure) %>% ieugwasr::ld_clump(., plink_bin=genetics.binaRies::get_plink_binary(), bfile="data_maf0.01_rs_ref") %>% {.$rsid} expd3 <- subset(expd3, SNP %in% retain_snps)
This only works if you extract on rsids at the moment, and is quite slow. But here it is:
gwasvcf::set_plink() outd2 <- gwasvcf::query_gwas("ieu-a-7.vcf.gz", rsid = expd3$SNP, proxies="yes", bfile="data_maf0.01_rs_ref") outd2 <- gwasglue::gwasvcf_to_TwoSampleMR(outd2, "outcome")
A number of other MR methods are available. The current framework for using them is to:
Examples of formats that you can convert to within TwoSampleMR:
All data in OpenGWAS is stored on bc4 in the form of GWAS VCF files. You can create harmonised datasets easily on bc4 with these files.
Determine the locations of the GWAS VCF files and a number of other reference datasets and binaries:
set_bcftools() set_plink() set_bc4_files()
Now simply run:
dat <- make_TwoSampleMR_dat("ieu-a-300", "ieu-a-7")
This can be run in parallel for large combinations of exposures and outcomes, e.g.:
dat <- make_TwoSampleMR_dat( id1=c("ieu-a-300", "ieu-a-302", "ieu-a-299"), id2=c("ieu-a-7", "ieu-a-2"), nthreads=6 )
This will lookup all instruments in the exposures (id1) in both exposures and outcomes, and harmonise all exposure-outcome pairs, parallelised across 6 threads.
Note: please make sure to run these analyses in batch mode by submitting to the slurm scheduler, not interactively on the head nodes!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.