Exposure data

You can read in exposure data from a text file, or from the GWAS catalog.

Text file

The text file must have the following columns (in any order) with these exact column headers

We provide three example files in the package, one for telomere length, one for bladder cancer, and one for coronary heart disease. After installing the package you can find their locations on your computer like this:

library(TwoSampleMR)
tl_file <- system.file("data/telomere_length.txt", package="TwoSampleMR")
bc_file <- system.file("data/bladdercancer.txt", package="TwoSampleMR")
chd_file <- system.file("data/cardiogram.txt", package="TwoSampleMR")

To read in the data:

library(TwoSampleMR)
library(pander)
tl <- read_exposure_data(tl_file, "Telomere length")

The function looks up the rs IDs in biomart to validate them and retrieve chromosome and position information.

GWAS catalog

The GWAS catalog (https://www.ebi.ac.uk/gwas/) is a collection of r data(gwas_catalog); nrow(gwas_catalog) reported associations against r length(unique(gwas_catalog$Phenotype)) traits. These have the potential to be used as instruments in 2SMR. We have downloaded the GWAS catalog and made important formatting changes in order to simplify usage of these data for MR. Please note that the reformatting of this data may be unreliable.

To use the GWAS catalog:

data(gwas_catalog)

and to extract, e.g. the instruments for BMI using the [@Speliotes2010] SNPs:

bmi <- subset(gwas_catalog, Phenotype=="Body mass index" & Year==2010 & grepl("kg", Units))
bmi <- format_gwas_catalog(bmi)

Data within R

You can create the expoure data object using data already in R also. e.g.

manual_exposure <- data.frame(
    SNP = c("rs10150332", "rs10767664"),
    beta = c(0.13, 0.19),
    se = c(0.03061224, 0.03061224),
    effect_allele = c("C", "A"),
    other_allele = c("T", "T"),
    eaf = c(0.21, 0.78),
    P_value = c(3e-11, 5e-26)
)
manual_exposure <- format_exposure_dat(manual_exposure, "BMI")

LD pruning

For most 2SMR methods it is required that the instruments are in linkage equilibrium. To prune the SNPs we recommend using the clumping method which, when having to choose which of two SNPs to eliminate if they are in LD, will keep the one with the lowest p-value. There are two methods to do this, using either your own reference data and local computer, or using the MR-Base server remotely.

Using MR-Base

This is currently limited to using European samples from 1000 genomes data only. You must provide a data frame that has a P-value column, SNP name, SNP position and SNP chromosome. The output from read_exposure_data has all of these.

tl <- clump_data(tl)
bmi <- clump_data(bmi)

This returns the same data frame with any SNPs in LD removed.

Local LD pruning

You need to obtain the executable for plink2, and binary plink files for your reference data, e.g. 1000 genomes data.

bmi <- clump_data(bmi, where="local", refdat="/path/to/reference_data", plink_bin="/path/to/plink2")

Outcome data

Ideally for every SNP in your exposure data you will have the corresponding summary statistics from a GWAS on the outcome trait. There are two ways in which this data can be entered into R: by performing a lookup in the MR-Base database, or manually using your own summary statistics.

MR-Base lookup

To see a list of all the available GWAS studies in the database:

ao <- available_outcomes()
nrow(ao)
pander(head(ao))

and to extract the BMI SNPs from the Celiac disease and T2D GWASs provide the relevant exposure data and outcome IDs:

outcome_dat <- extract_outcome_data(bmi, c(13, 6))

The resulting data frame has the following columns:

names(outcome_dat)

Manual

Alternatively, create a data frame with the following columns:

Harmonising exposure and outcome data

In order to ensure that the effect sizes in the exposure data correspond to the same allele effects in the outcome data it is necessary to check and harmonise these data. We provide a function that ensures all corresponding exposure and outcome alleles are on the same strand (where possible), and returns a harmonised dataset.

bmi_outcome <- harmonise_exposure_outcome(bmi, outcome_dat, action = 2)

The action argument specifies how strict to be when checking for the correct strand:

Performing 2SMR

The data is now ready to perform 2-sample MR. To do this:

mr_results <- mr(bmi_outcome)

This performs MR using the following different available methods:

pander(mr_method_list())

Sensitivity analysis

To see how different methods compare:

mr_scatter_plot(mr_results, bmi_outcome)[[2]]

To compare the effects of each of the SNPs we can generate a forest plot, e.g. for BMI on T2D:

s <- mr_singlesnp(bmi_outcome)
mr_forest_plot(s)[[2]]

And to perform a leave-one-out sensitivity analysis:

l <- mr_leaveoneout(bmi_outcome)
mr_leaveoneout_plot(l)[[2]]

Enrichment

You can also perform a Fisher's combined test to see if there is an enrichment of p-values in the outcome data (e.g. are the SNPs that influence the exposure also likely to influence the outcome, this is not an MR analysis).



MRCIEU/TwoSampleMR documentation built on May 2, 2024, 10:22 p.m.