knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package Info

The moloc package is an extension of coloc [see @Giambartolomei2014] that can be used to perform genetic co-localization analysis of multiple phenotypes, to understand whether they share common genetic causal variant(s) in a given region.

These examples will guide through how to run moloc using three traits on:

Usage

moloc on single locus

The main function is called moloc_test and outputs posterior probabilities for each combination of datasets used in the input list. The default is to average the Bayes Factors across three prior variances, 0.01, 0.1, 0.5, as suggested in Wakefield [see @Wakefield2009].

Input

* The summary statistics for each dataset must be in a list (e.g. list(gwas, eqtl, mqtl)). 
    Must have columns `SNP` or `CHR` and `POS`; `BETA`, `SE`; `N` and `MAF` (to estimate sdY); 
    if a case control: `Ncases`.
    If eqtl/mqtl: `ProbeID`.
    If the regions are defined based on ProbeID, these must match 
    the ProbeID in the file.
    Optionally, "A1", "A2" if want to match alleles
library(knitr)
options(scipen = 1, digits = 2)
## load single locus data (in a list) and bed file
data_single=get(load(file="../data/data_single.rda"))
t1 = knitr::kable(head(data_single[[1]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)

t2 = knitr::kable(head(data_single[[2]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)

t3 = knitr::kable(head(data_single[[3]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
cat(c('<table><tr valign="top"><td>', t1, '</td><td>', t2, '</td><td>', t3, '</td><tr></table>'),
    sep = '')
## 

Output

The output is a list with three elements:

  1. First element is a data frame containing the priors, likelihoods and Posteriors for each locus and each combination. We usually care about the last columns, the posterior probability of a shared variant for each model;

  2. Second element is the number of SNPs analyzed, in common across all datasets;

  3. Third element of the output is a data frame with the posterior probability of the most likely SNP.

options(scipen = 1, digits = 2)
library(moloc)
moloc <- moloc_test(data_single, prior_var=c(0.01, 0.1, 0.5), priors=c(1e-04, 1e-06, 1e-07))
# Posteriors
print(moloc[[1]])
# Number of SNPs analyzed
print(moloc[[2]])
# Posterior of the most likely SNP co-localizing with another trait
print(moloc[[3]])

Usage

moloc Genome-wide/multiple loci analysis

The main function for genome-wide analysis is called coloc.genome and runs moloc with all the combinations from your datasets, for example all the SNPs from a GWAS, genes, and methylation IDs. It outputs posterior probabilities for each region defined by the bed file. This function requires two packages: foreach and doParallel, and requires two files described below.

Input

* The summary statistics for each dataset in a list (e.g. list(gwas, eqtl, mqtl)), same as for the single locus, but now have multiple ProbeIDs;
    Must have columns `SNP` or `CHR` and `POS`; `BETA`, `SE`; `N` and `MAF` (to estimate sdY); 
    if a case control: `Ncases`.
    If eqtl/mqtl: `ProbeID`.
    If the regions are defined based on ProbeID, these must match 
    the ProbeID in the file.
    Optionally, "A1", "A2" if want to match alleles

* The bed file: specifies the region to use.
  Must contain the columns `ProbeID`, `CHR`, `START`, `STOP`  (START and STOP of region). 
  (other columns are ignored)
library(knitr)
options(scipen = 1, digits = 2)
## load single locus data (in a list) and bed file
library(moloc)
data_genome=get(load(file="../data/data_genome.rda"))
t1 = knitr::kable(head(data_genome[[1]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
t2 = knitr::kable(head(data_genome[[2]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
t3 = knitr::kable(head(data_genome[[3]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
cat(c('<table><tr valign="top"><td>', t1, '</td><td>', t2, '</td><td>', t3, '</td><tr></table>'),
    sep = '')
## 
knitr::kable(bed[1:2,])
## 
    # To create the bed file with combination of ProbeIDs you can use:
    library(GenomicRanges)
    DT <- as.data.table(data_genome[[3]])
    methyl_table <- DT[, list(CHR= unique(CHR), START = min(POS), STOP = max(POS)), by = ProbeID]
    bed1.gr <- GRanges(seqnames = bed$CHR,IRanges(start = bed$START, end= bed$STOP))
    bed2.gr <- GRanges(seqnames = methyl_table$CHR,IRanges(start =methyl_table$START, end= methyl_table$STOP))
    my.overlap <- findOverlaps(query = bed2.gr, subject = bed1.gr)
    bed = cbind(bed[my.overlap@subjectHits,], methyl_table[my.overlap@queryHits,])
    bed = bed[,c(1,6,7,8,9,10)]
    # Setting the prior_var to "default" uses coloc defaults

Output

The output is a list with two elements:

  1. First element is a data frame containing the priors, likelihoods and Posteriors for each locus and each combination.

  2. Second element is the loci analyzed.

options(scipen = 1, digits = 2)
library(moloc)
library(foreach)
library(doParallel)
res <- coloc.genome(data_genome, bed, prefix = "pref", save.SNP.info=FALSE, cores=20, have_alleles=TRUE, bychrpos=TRUE, prior_var="default", priors=c(1e-04, 1e-06, 1e-07), min_nsnps = 50, takelog = FALSE, write=TRUE, outfolder = "test", forcePVAL = FALSE)
# Posteriors
print(res[[1]])
# Bed file of loci analyzed
print(res[[2]])

The columns are:

...

...

...

...

References

[1] Giambartolomei et al., 2014, PLoS Genet 10(5): e1004383. https://doi.org/10.1371/journal.pgen.1004383

[2] Wakefield J, 2009, Genetic Epidemiology 33: 79–86.



clagiamba/moloc documentation built on Jan. 25, 2021, 2:43 a.m.