knitr::opts_chunk$set(echo = TRUE, eval=TRUE)

In this tutorial, we prepare input data for the analysis.

Load the package.

library(mapgen)

GWAS summary statistics

We need a data frame of GWAS summary statistics with the following columns:

Let's load a small example GWAS dataset (test.sumstats.txt.gz), and process the data (see below).

We can use the vroom package to read the GWAS summary statistics.

library(vroom)
gwas.file <- system.file('extdata', 'test.sumstats.txt.gz', package='mapgen')
sumstats <- vroom(gwas.file, col_names = TRUE, show_col_types = FALSE)
head(as.data.frame(sumstats),3)

LD blocks

For the LD blocks, we need a data frame with four columns: chromosome, start and end positions, and the indices of the LD blocks.

We provided the LD blocks from LDetect for the 1000 Genomes (1KG) European population (in hg19).

LD_Blocks <- readRDS(system.file('extdata', 'LD.blocks.EUR.hg19.rds', package='mapgen'))
head(LD_Blocks, 3)

* You can skip this if you only need to run enrichment analysis.

Reference panel

Fine-mapping requires linkage-disequilibrium (LD) information.

You could either provide LD matrices or use a reference genotype panel, which will compute LD matrices internally.

To use the reference genotype panel, we utilize the R package bigsnpr to read in PLINK files (bed/bim/fam) into R and match alleles between GWAS summary statistics and reference genotype panel.

If you have reference genotypes in PLINK format, you can use bigsnpr::snp_readBed() to read bed/bim/fam files into a bigSNP object and save as a '.rds' file. This '.rds' file could be used for downstream analyses when attached with \code{bigsnpr::snp_attach()}.

We provided a bigSNP object of the reference genotype panel from the 1000 Genomes (1KG) European population. If you are in the He lab at UChicago, you can load the bigSNP object from RCC as below.

library(bigsnpr)
bigSNP <- snp_attach(rdsfile = '/project2/xinhe/1kg/bigsnpr/EUR_variable_1kg.rds')

We also have pre-computed LD matrices of European samples from UK Biobank. They can be downloaded [here][UKBB_LD]. If you are at UChicago, you can directly access those LD matrices from RCC at /project2/mstephens/wcrouse/UKB_LDR_0.1_b37/.

We create a data frame region_info, with filenames of UKBB reference LD matrices and SNP info adding to the LD_Blocks.

region_info <- get_UKBB_region_info(LD_Blocks,
                                    LDREF.dir = "/project2/mstephens/wcrouse/UKB_LDR_0.1_b37", 
                                    prefix = "ukb_b37_0.1")

* You can skip this if you only need to run enrichment analysis, or if you only need to run downstream analysis (e.g. gene mapping) with precomputed fine-mapping result.

Process GWAS summary statistics

We run process_gwas_sumstats() to process the summary statistics. This checks and cleans GWAS summary statistics, add locus ID from the LD_Blocks if available, match alleles in GWAS SNPs with the reference panel from the bigSNP object if bigSNP is specified, or harmonize GWAS SNPs with the reference LD information from the precopmuted LD matrices if region_info is specified.

gwas.sumstats <- process_gwas_sumstats(sumstats, 
                                       chr='chr', 
                                       pos='position_b37', 
                                       beta='bcac_onco2_beta', 
                                       se='bcac_onco2_se',
                                       a0='a0', 
                                       a1='a1', 
                                       snp='phase3_1kg_id', 
                                       pval='bcac_onco2_P1df_Wald',
                                       LD_Blocks=LD_Blocks,
                                       bigSNP=bigSNP)

* You do not need to specify bigSNP, region_info, or LD_Blocks if you only need to run enrichment analysis.

Check that the summary statistics is processed and has appropriate columns:

gwas.sumstats <- readRDS(system.file('extdata', 'test.processed.sumstats.rds', package='mapgen'))
head(as.data.frame(gwas.sumstats),3)


kevinlkx/mapgen documentation built on March 31, 2024, 11:03 p.m.