2. A quick application of coloclization analysis

knitr::opts_chunk$set(
  echo=TRUE,
  progress =FALSE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)

xQTLbiolinks has provided a pipeline for disease target gene discovery through colocalization analysis with xQTLs to detect susceptible genes and causal variants.

# knitr::include_graphics("images/quick_start/colocalization_framework.png")

Before we start

To perform colocalization analysis using xQTLbiolinks, users need to provide their own GWAS summary statistics data, specify the tissue of interest and import following packages:

library(data.table)
library(xQTLbiolinks)
library(stringr)
library(coloc)

Step 1. data pre-processing

temp1 <- tempfile(fileext=".zip")
download.file("http://bioinfo.szbl.ac.cn/xQTL_biolinks/xqtl_data/env_new.zip", temp1)
load(unz(temp1,"env_new.RData"))
close(file(temp1))
rm(temp1)

Download and load an example file of summary statistics dataset (GRCh38). We perform colocalization analysis in Brain - Cerebellum.

gwasDF <- fread("http://bioinfo.szbl.ac.cn/xQTL_biolinks/xqtl_data/gwasDFsub.txt")
tissueSiteDetail="Brain - Cerebellum"

In this example, a data.table object of 16538 (rows) x 5 (cols) is loaded. Five columns are required (arbitrary column names is supported, but columns must be as the following order):

Col 1. "variants" (character), , using an rsID (e.g. "rs11966562");

Col 2. "chromosome" (character), one of the chromosome from chr1-chr22;

Col 3. "position" (integer), genome position of snp;

Col 4. "P-value" (numeric);

Col 5. "MAF" (numeric). Allel frequency;

Col 6. "beta" (numeric). effect size.

Col 7. "se" (numeric). standard error.

head(gwasDF)

Step 2. Identify sentinel snps.

Sentinel SNPs can be detected using xQTLanalyze_getSentinelSnp with the arguments p-value < 5e-8 and SNP-to-SNP distance > 10e6 bp. We recommend converting the genome version of the GWAS file to GRCh38 if that is GRCh37 (run with parameter: genomeVersion="grch37"; grch37To38=TRUE, and package rtracklayeris required).

sentinelSnpDF <- xQTLanalyze_getSentinelSnp(gwasDF, pValueThreshold = 5e-08)

After filtering, a sentinel SNP with P-value\<5e-8 is detected in this example:

sentinelSnpDF

Step 3. Identify trait genes for each sentinel SNPs.

Trait genes are genes that located in the range of 1Mb (default, can be changed with parameter detectRange) of sentinel SNPs. Every gene within 1Mb of sentinel SNPs is searched by fuction xQTLanalyze_getTraits. Besides, In order to reduce the number of trait genes and thus reduce the running time, we take the overlap of eGenes and trait genes as the final output of the function xQTLanalyze_getTraits.

traitsAll <- xQTLanalyze_getTraits(sentinelSnpDF, detectRange=1e6, tissueSiteDetail=tissueSiteDetail)

Totally, 3 associations between 3 traits genes and 1 SNPs are detected

traitsAll

Step 4. Conduct colocalization analysis.

For a single trait gene, like ENSG00000109684 in above table, colocalization analysis (using coloc method) can be performed with:

output <- xQTLanalyze_coloc(gwasDF, "ENSG00000109684", tissueSiteDetail=tissueSiteDetail) # using gene symbol

output is a list, including two parts: coloc_Out_summary and gwasEqtlInfo.

output$coloc_Out_summary

For multiple trait genes, a for loop or lapply function can be used to get all genes' outputs (using both coloc and hyprcoloc methods).

outputs <- rbindlist(lapply( unique(traitsAll$gencodeId), function(x){ # using gencode ID.
           xQTLanalyze_coloc(gwasDF, x, tissueSiteDetail=tissueSiteDetail, method = "Both")$colocOut }))

outputs is a data.table that combined all results of coloc_Out_summary of all genes.

outputs

Step 5. Visualization of the results.

For the potential casual gene ENSG00000109684 (PP4=0.9937 & hypr_posterior=0.9793), we can get its significant associations across tissues:

xQTLvisual_eqtl("ENSG00000109684")
# knitr::include_graphics("images/quick_start/eQTL_multiple_tissue.png")

For the purpose of visualization of p-value distribution and comparison of the signals of GWAS and eQTL: We merge the variants of GWAS and eQTL by rsid first.

eqtlAsso <- xQTLdownload_eqtlAllAsso(gene="ENSG00000109684", 
                                     tissueLabel = tissueSiteDetail)
gwasEqtldata <- merge(gwasDF, eqtlAsso[,.(rsid=snpId, position=pos, maf, pValue)],
                      by=c("rsid", "position"), suffixes = c(".gwas",".eqtl"))

Function xQTLvisual_locusCompare displays the candidate SNP rs13120565 (SNP.PP.H4=0.5328849 & hypr_posterior_explainedBySnp=0.4747) in the top right corner:

xQTLvisual_locusCompare(gwasEqtldata[,.(rsid, pValue.eqtl)], 
                        gwasEqtldata[,.(rsid, pValue.gwas)], legend_position = "bottomright")

Note: The information on SNP linkage disequilibrium is automatically downloaded online. Due to network issues, the download may fail at times. If this happens, please try running it again.

# knitr::include_graphics("images/quick_start/compare.png")

Locuszoom plot of GWAS signals:

xQTLvisual_locusZoom(gwasEqtldata[,.(rsid, chrom, position, pValue.gwas)], legend=FALSE)
# knitr::include_graphics("images/quick_start/gwas_legendF.png")

LocusZoom plot of eQTL signals:

xQTLvisual_locusZoom(gwasEqtldata[,.(rsid, chrom, position, pValue.eqtl)], 
                     highlightSnp = "rs13120565", legend=FALSE)
# knitr::include_graphics("images/quick_start/eqtl_legendF.png")

Violin plot of normalized expression of eQTL (rs13120565-ENSG00000187323.11):

xQTLvisual_eqtlExp("rs13120565", "ENSG00000109684", tissueSiteDetail = tissueSiteDetail)
# knitr::include_graphics("images/quick_start/exp.png")

We can also combine locuscompare and locuszoom plot using xQTLvisual_locusCombine:

xQTLvisual_locusCombine(gwasEqtldata[,c("rsid","chrom", "position", "pValue.gwas", "pValue.eqtl")], 
                        highlightSnp="rs13120565")
# knitr::include_graphics("images/quick_start/xQTLvisual_locusCombine.png")

Colocalized loci should show a general pattern where SNPs in high LD will show strong associations with expression levels of the colocalized gene, and the eQTL associations will weaken for SNPs in lower LD. This pattern of the eQTL varies among different tissues / cell-types, the strength of which indicates the power of the regulatory effect of the variant. We can visualize the correlation between p-values of eQTLs and LD across numerous tissues/cell types to easily distinguish this patten using xQTLvisual_coloc:

multi_tissue_coloc <- xQTLvisual_coloc(gene="ENSG00000109684", variantName="rs13120565", 
                                      tissueLabels = c("Brain - Cerebellar Hemisphere", 
                                                        "Brain - Cerebellum", "Thyroid", "Lung",
                                                        "Cells - EBV-transformed lymphocytes"))
# knitr::include_graphics("images/quick_start/propensity.png")

For applying xQTLbiolinks to a whole case study, please find this Document.



Try the xQTLbiolinks package in your browser

Any scripts or data that you put into this service are public.

xQTLbiolinks documentation built on Sept. 15, 2023, 1:06 a.m.