Helper/wrapper functions for executing coloc & associated analyses.
If you use any of the code from this repository, please cite the original coloc R package and our doi.
The code in this repository is released under an MIT license. This repository is distributed in the hope that it will be useful to the wider community, but without any warranty of any kind. Please see the LICENSE file for more details.
There is no plan to ever submit this code to CRAN
or Bioconductor
. This code was developed for use by the Ryten Lab and collaborators thereof. While most of the code has been separately tested by David and Regina, it is highly likely bugs exist.
If you would like to install the development version from GitHub you can use the following command:
if(!require("remotes"))
install.packages("remotes") # if necessary
library(remotes)
install_github("RHReynolds/colochelpR")
The functions herein are simply wrapper functions for executing coloc. Please refer to the coloc vignette for more background on use of coloc. Please remember to cite coloc, if you use this helper package.
For an example of how the wrapper functions in this package have been used in analyses, please refer to the following repository.
For these helper functions to work, it is important that GWASs/QTL datasets have been "tidied" and the necessary columns are present. So far, we have only used coloc for GWAS and eQTL datasets, the necessary columns for which are presented in tables below.
Column name | Description
--------------- | --------------------------------------------------------------------
GWAS | GWAS name.
SNP | SNP identifier. The format of these will need to match the second GWAS/dataset you intend to run in coloc i.e. both location-based (CHR:BP) or RS-ID-based. If location-based, ensure that you are mapping to the same genome build in both datasets.
beta | Regression coefficient.
se | Standard error of regression coefficient.
varbeta | This is generated through use of the function colochelpR::get_varbeta()
, provided either se
is available or beta
and t.stat
.
p.value | P-value of association.
Al1 | Effect allele.
Al2 | Alternate allele.
maf | Minor allele frequency.
p.value
and maf
, but estimated single SNP regression co-efficients (beta
) and their variances (varbeta
) or standard errors (se
) are preferred.beta
and se
, but you do have the odds ratio (often OR) and the Z-score, you can generate beta
and se
via the following formulas:GWAS %>%
dplyr::mutate(beta = log(OR, base = exp(1)),
se = log(OR, base = exp(1))/Z_SCORE)
Column name | Description
--------------- | --------------------------------------------------------------------
eQTL_dataset | Name of eQTL dataset.
gene | Ensembl gene ID for gene regulated by SNP. Often the original eQTL dataset will provide HGNC symbols, or if you are using a microarray-based dataset, it might provide probe IDs. For HGNC symbols, you can either use biomaRt
, or if you will be converting many gene ids, you can always use a .GTF
for GRCh37/38 (depending on what your eQTL dataset was mapped to). For probe IDs, these will need to be mapped back to genes. This is may be provided by the eQTL generator, although for common microarray plates (e.g. Affymetrix arrays) (i) biomaRt
can be used and (ii) Bioconductor offers offers a range of annotation packages that can be used to convert probe IDs.
SNP | SNP identifier. The format of these will need to match the first GWAS/dataset you intend to run in coloc i.e. both location-based (CHR:BP) or RS-ID-based. If location-based, ensure that you are mapping to the same genome build in both datasets.
beta | Regression coefficient.
se | Standard error of regression coefficient.
varbeta | This is generated through use of the function colochelpR::get_varbeta()
, provided either se
is available or beta
and t.stat
.
p.value | P-value of association.
Al1 | Effect allele.
Al2 | Alternate allele.
maf | Minor allele frequency.
N | Number of individuals.
beta
and se
are not available, p.values
and maf
can be used.MafDb.1Kgenomes.phase3.hs37d5
contains MAFs for a number of populations (remember to select the population that matches your eQTL dataset). See example code below:mafdb <- MafDb.1Kgenomes.phase3.hs37d5
example_df <- data.frame(rs_id = c("rs12921634", "rs1476958", "rs56189750"))
mafs <- GenomicScores::gscores(x = mafdb, ranges = example_df$rs_id %>% as.character(), pop = "EUR_AF")
mafs <- mafs %>%
as.data.frame() %>%
tibble::rownames_to_column(var = "SNP") %>%
dplyr::rename(maf = EUR_AF) %>%
dplyr::select(SNP, maf)
example_df <- example_df %>%
inner_join(mafs, by = c("rs_id" = "SNP"))
colochelpR::convert_rs_to_loc()
). BSgenome::snpsById()
function, and therefore requires a SNPlocs object, which is a container for storing known SNP locations. BSgenome::available.SNPs()
to return a vector of the available SNPlocs packages. We have also written a wrapper function for conversion of chromosome locations to RS ids (colochelpR::convert_loc_to_rs()
).
BSgenome::snpsByOverlaps()
function, and therefore requires a SNPlocs object, which is a container for storing known SNP locations (as described above).SNP
) after the conversion.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.