In clagiamba/moloc: Multiple Traits Co-localization

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package Info

The moloc package is an extension of coloc [see @Giambartolomei2014] that can be used to perform genetic co-localization analysis of multiple phenotypes, to understand whether they share common genetic causal variant(s) in a given region.

These examples will guide through how to run moloc using three traits on:

A single locus
Genome-wide/multiple loci analysis

Usage

`moloc` on single locus

The main function is called moloc_test and outputs posterior probabilities for each combination of datasets used in the input list. The default is to average the Bayes Factors across three prior variances, 0.01, 0.1, 0.5, as suggested in Wakefield [see @Wakefield2009].

Input

* The summary statistics for each dataset must be in a list (e.g. list(gwas, eqtl, mqtl)). 
    Must have columns `SNP` or `CHR` and `POS`; `BETA`, `SE`; `N` and `MAF` (to estimate sdY); 
    if a case control: `Ncases`.
    If eqtl/mqtl: `ProbeID`.
    If the regions are defined based on ProbeID, these must match 
    the ProbeID in the file.
    Optionally, "A1", "A2" if want to match alleles

library(knitr)
options(scipen = 1, digits = 2)
## load single locus data (in a list) and bed file
data_single=get(load(file="../data/data_single.rda"))
t1 = knitr::kable(head(data_single[[1]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)

t2 = knitr::kable(head(data_single[[2]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)

t3 = knitr::kable(head(data_single[[3]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
cat(c('<table><tr valign="top"><td>', t1, '</td><td>', t2, '</td><td>', t3, '</td><tr></table>'),
    sep = '')
##

Output

The output is a list with three elements:

First element is a data frame containing the priors, likelihoods and Posteriors for each locus and each combination. We usually care about the last columns, the posterior probability of a shared variant for each model;
Second element is the number of SNPs analyzed, in common across all datasets;
Third element of the output is a data frame with the posterior probability of the most likely SNP.

options(scipen = 1, digits = 2)
library(moloc)
moloc <- moloc_test(data_single, prior_var=c(0.01, 0.1, 0.5), priors=c(1e-04, 1e-06, 1e-07))
# Posteriors
print(moloc[[1]])
# Number of SNPs analyzed
print(moloc[[2]])
# Posterior of the most likely SNP co-localizing with another trait
print(moloc[[3]])

Usage

`moloc` Genome-wide/multiple loci analysis

The main function for genome-wide analysis is called coloc.genome and runs moloc with all the combinations from your datasets, for example all the SNPs from a GWAS, genes, and methylation IDs. It outputs posterior probabilities for each region defined by the bed file. This function requires two packages: foreach and doParallel, and requires two files described below.

Input

* The summary statistics for each dataset in a list (e.g. list(gwas, eqtl, mqtl)), same as for the single locus, but now have multiple ProbeIDs;
    Must have columns `SNP` or `CHR` and `POS`; `BETA`, `SE`; `N` and `MAF` (to estimate sdY); 
    if a case control: `Ncases`.
    If eqtl/mqtl: `ProbeID`.
    If the regions are defined based on ProbeID, these must match 
    the ProbeID in the file.
    Optionally, "A1", "A2" if want to match alleles

* The bed file: specifies the region to use.
  Must contain the columns `ProbeID`, `CHR`, `START`, `STOP`  (START and STOP of region). 
  (other columns are ignored)

library(knitr)
options(scipen = 1, digits = 2)
## load single locus data (in a list) and bed file
library(moloc)
data_genome=get(load(file="../data/data_genome.rda"))
t1 = knitr::kable(head(data_genome[[1]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
t2 = knitr::kable(head(data_genome[[2]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
t3 = knitr::kable(head(data_genome[[3]], 5),  format='html', table.attr='cellpadding="1"', output = FALSE)
cat(c('<table><tr valign="top"><td>', t1, '</td><td>', t2, '</td><td>', t3, '</td><tr></table>'),
    sep = '')
##

knitr::kable(bed[1:2,])
##

    # To create the bed file with combination of ProbeIDs you can use:
    library(GenomicRanges)
    DT <- as.data.table(data_genome[[3]])
    methyl_table <- DT[, list(CHR= unique(CHR), START = min(POS), STOP = max(POS)), by = ProbeID]
    bed1.gr <- GRanges(seqnames = bed$CHR,IRanges(start = bed$START, end= bed$STOP))
    bed2.gr <- GRanges(seqnames = methyl_table$CHR,IRanges(start =methyl_table$START, end= methyl_table$STOP))
    my.overlap <- findOverlaps(query = bed2.gr, subject = bed1.gr)
    bed = cbind(bed[my.overlap@subjectHits,], methyl_table[my.overlap@queryHits,])
    bed = bed[,c(1,6,7,8,9,10)]
    # Setting the prior_var to "default" uses coloc defaults

Output

The output is a list with two elements:

First element is a data frame containing the priors, likelihoods and Posteriors for each locus and each combination.
Second element is the loci analyzed.

options(scipen = 1, digits = 2)
library(moloc)
library(foreach)
library(doParallel)
res <- coloc.genome(data_genome, bed, prefix = "pref", save.SNP.info=FALSE, cores=20, have_alleles=TRUE, bychrpos=TRUE, prior_var="default", priors=c(1e-04, 1e-06, 1e-07), min_nsnps = 50, takelog = FALSE, write=TRUE, outfolder = "test", forcePVAL = FALSE)
# Posteriors
print(res[[1]])
# Bed file of loci analyzed
print(res[[2]])

The columns are:

ProbeID: refers to the expression data;
ProbeIDmet: refers to the methylation data;
CHR, START, STOP refer to the region defined by the bed file;
nsnps is the number of SNPs analyzed per region;
minpi are the minimum p-values for each analyzed dataset in the region;
minpi.snp are the snps with the minimum p-values;
bf.a are the bayes factors supporting association with GWAS;
bf.a,b are the bayes factors supporting association of one variant with GWAS and a different variant for eQTL;
bf.ac,b are the bayes factors supporting association of the same variant with GWAS and mQTL, and a different variant for eQTL;

...

PPA.a are the corresponding posterior probabilities;

...

best.snp.a is the SNP with the highest probability for GWAS;

...

best.snp.PPA.a is the SNP posterior probability;

...

References

[1] Giambartolomei et al., 2014, PLoS Genet 10(5): e1004383. https://doi.org/10.1371/journal.pgen.1004383

[2] Wakefield J, 2009, Genetic Epidemiology 33: 79–86.

clagiamba/moloc documentation built on Jan. 25, 2021, 2:43 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clagiamba/moloc
Multiple Traits Co-localization

In clagiamba/moloc: Multiple Traits Co-localization

Package Info

Usage

`moloc` on single locus

Input

Output

Usage

`moloc` Genome-wide/multiple loci analysis

Input

Output

References

R Package Documentation

Browse R Packages

We want your feedback!

clagiamba/moloc Multiple Traits Co-localization

In clagiamba/moloc: Multiple Traits Co-localization

Package Info

Usage

moloc on single locus

Input

Output

Usage

moloc Genome-wide/multiple loci analysis

Input

Output

References

R Package Documentation

Browse R Packages

We want your feedback!

clagiamba/moloc
Multiple Traits Co-localization

`moloc` on single locus

`moloc` Genome-wide/multiple loci analysis