In MRCIEU/TwoSampleMR: Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

Exposure data

You can read in exposure data from a text file, or from the GWAS catalog.

Text file

The text file must have the following columns (in any order) with these exact column headers

SNP (rs IDs, required)
beta (numeric effect size, required for MR)
se (numeric standard error, required for MR)
effect_allele (the allele that has the beta effect, required for MR)
other_allele (the other allele, recommended for MR)
eaf (effect allele frequency, recommended for MR)
P_value (p-value of effect, not required for MR but used for other analyses. If beta and se are provided then this will be calculated for you.)

We provide three example files in the package, one for telomere length, one for bladder cancer, and one for coronary heart disease. After installing the package you can find their locations on your computer like this:

library(TwoSampleMR)
tl_file <- system.file("data/telomere_length.txt", package="TwoSampleMR")
bc_file <- system.file("data/bladdercancer.txt", package="TwoSampleMR")
chd_file <- system.file("data/cardiogram.txt", package="TwoSampleMR")

To read in the data:

library(TwoSampleMR)
library(pander)
tl <- read_exposure_data(tl_file, "Telomere length")

The function looks up the rs IDs in biomart to validate them and retrieve chromosome and position information.

GWAS catalog

The GWAS catalog (https://www.ebi.ac.uk/gwas/) is a collection of r data(gwas_catalog); nrow(gwas_catalog) reported associations against r length(unique(gwas_catalog$Phenotype)) traits. These have the potential to be used as instruments in 2SMR. We have downloaded the GWAS catalog and made important formatting changes in order to simplify usage of these data for MR. Please note that the reformatting of this data may be unreliable.

To use the GWAS catalog:

data(gwas_catalog)

and to extract, e.g. the instruments for BMI using the [@Speliotes2010] SNPs:

bmi <- subset(gwas_catalog, Phenotype=="Body mass index" & Year==2010 & grepl("kg", Units))
bmi <- format_gwas_catalog(bmi)

Data within R

You can create the expoure data object using data already in R also. e.g.

manual_exposure <- data.frame(
    SNP = c("rs10150332", "rs10767664"),
    beta = c(0.13, 0.19),
    se = c(0.03061224, 0.03061224),
    effect_allele = c("C", "A"),
    other_allele = c("T", "T"),
    eaf = c(0.21, 0.78),
    P_value = c(3e-11, 5e-26)
)
manual_exposure <- format_exposure_dat(manual_exposure, "BMI")

LD pruning

For most 2SMR methods it is required that the instruments are in linkage equilibrium. To prune the SNPs we recommend using the clumping method which, when having to choose which of two SNPs to eliminate if they are in LD, will keep the one with the lowest p-value. There are two methods to do this, using either your own reference data and local computer, or using the MR-Base server remotely.

Using MR-Base

This is currently limited to using European samples from 1000 genomes data only. You must provide a data frame that has a P-value column, SNP name, SNP position and SNP chromosome. The output from read_exposure_data has all of these.

tl <- clump_data(tl)
bmi <- clump_data(bmi)

This returns the same data frame with any SNPs in LD removed.

Local LD pruning

You need to obtain the executable for plink2, and binary plink files for your reference data, e.g. 1000 genomes data.

bmi <- clump_data(bmi, where="local", refdat="/path/to/reference_data", plink_bin="/path/to/plink2")

Outcome data

Ideally for every SNP in your exposure data you will have the corresponding summary statistics from a GWAS on the outcome trait. There are two ways in which this data can be entered into R: by performing a lookup in the MR-Base database, or manually using your own summary statistics.

MR-Base lookup

To see a list of all the available GWAS studies in the database:

ao <- available_outcomes()
nrow(ao)
pander(head(ao))

and to extract the BMI SNPs from the Celiac disease and T2D GWASs provide the relevant exposure data and outcome IDs:

outcome_dat <- extract_outcome_data(bmi, c(13, 6))

The resulting data frame has the following columns:

names(outcome_dat)

Manual

Alternatively, create a data frame with the following columns:

beta.outcome (Numeric effect sizes, required)
se.outcome (Numeric standard errors, required)
eaf.outcome (Numeric frequency of the effect allele, recommended)
effect_allele.outcome (Effect allele, required)
other_allele.outcome (Other allele, recommended)
outcome (Name of the outcome, required)

Harmonising exposure and outcome data

In order to ensure that the effect sizes in the exposure data correspond to the same allele effects in the outcome data it is necessary to check and harmonise these data. We provide a function that ensures all corresponding exposure and outcome alleles are on the same strand (where possible), and returns a harmonised dataset.

bmi_outcome <- harmonise_exposure_outcome(bmi, outcome_dat, action = 2)

The action argument specifies how strict to be when checking for the correct strand:

1 means assume that all alleles are on the positive strand already (not recommended!)
2 means flip alleles where possible, and use allele frequencies to infer the strand of palindromic SNPs.
3 means flip alleles where possible but remove any palindromic SNPs

Performing 2SMR

The data is now ready to perform 2-sample MR. To do this:

mr_results <- mr(bmi_outcome)

This performs MR using the following different available methods:

pander(mr_method_list())

Sensitivity analysis

To see how different methods compare:

mr_scatter_plot(mr_results, bmi_outcome)[[2]]

To compare the effects of each of the SNPs we can generate a forest plot, e.g. for BMI on T2D:

s <- mr_singlesnp(bmi_outcome)
mr_forest_plot(s)[[2]]

And to perform a leave-one-out sensitivity analysis:

l <- mr_leaveoneout(bmi_outcome)
mr_leaveoneout_plot(l)[[2]]

Enrichment

You can also perform a Fisher's combined test to see if there is an enrichment of p-values in the outcome data (e.g. are the SNPs that influence the exposure also likely to influence the outcome, this is not an MR analysis).

MRCIEU/TwoSampleMR documentation built on June 9, 2025, 6:38 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MRCIEU/TwoSampleMR
Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

In MRCIEU/TwoSampleMR: Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

Exposure data

Text file

GWAS catalog

Data within R

LD pruning

Using MR-Base

Local LD pruning

Outcome data

MR-Base lookup

Manual

Harmonising exposure and outcome data

Performing 2SMR

Sensitivity analysis

Enrichment

R Package Documentation

Browse R Packages

We want your feedback!

MRCIEU/TwoSampleMR Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

In MRCIEU/TwoSampleMR: Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

Exposure data

Text file

GWAS catalog

Data within R

LD pruning

Using MR-Base

Local LD pruning

Outcome data

MR-Base lookup

Manual

Harmonising exposure and outcome data

Performing 2SMR

Sensitivity analysis

Enrichment

R Package Documentation

Browse R Packages

We want your feedback!

MRCIEU/TwoSampleMR
Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database