Introduction

We are indebted to the MRC Integrative Epidemiology Unit for producing the infrastructure leading to the following schematic:

mrc eco

Caveat about software component maintenance

The packages identified in the ecosystem are mostly accessed through github repositories. The interdependencies and idiosyncrasies of using Remotes in DESCRIPTION made it difficult to produce a container image, so I have forked several of the packages identified in the schematic, to tweak the DESCRIPTIONs. I have filed some pull requests but have not heard back. I will try to keep the DESCRIPTION for this workshop package current, minimizing use of forks, and hope individual package maintainers will get in touch.

Basic goals

The API viewed through ieugwasr

We'll start by listing the functions available in the ieugwasr package.

suppressPackageStartupMessages({
library(gwaslake)
library(plotly)
library(ieugwasr)
library(dplyr)
library(tibble)
library(DT)
library(ontoProc)
})
ls("package:ieugwasr")

The pkgdown site for ieugwasr includes a reference listing, visible here.

The gwasinfo command makes an API call to the MRC ecosystem.

gwi = gwasinfo()

However, we don't want to do this too often, so we've made a snapshot and keep it in the package.

gwi = gwidf_2021_01_30
as_tibble(gwi)

That's a huge table. We can get an overview by tabulating the 'batches' into which studies have been organized.

gb = batches()
datatable(gb)

Diseases available

A comprehensive view of phenotypes is challenging to assemble. The category field has a relevant value, but we'll want to explore further.

 table(gwi$category)
distab = gwi %>% filter(category=="Disease")
distab %>% 
   group_by(trait) %>% summarise(n=n()) %>% arrange(desc(n))

A searchable table will be useful.

datatable(as.data.frame(distab))

A visual survey displaying study sizes for selected outcomes

The survey_gwas function plots number of cases versus number of controls for all studies in which the phrases argument matches the trait field, using grep. With plotly, we get an interactive graphic with additional data displayed on hover.

ggplotly(survey_gwas(phrases=c("asthma", "asthmatic")))

The survey_app function returns a display like the above, and an interactive table of 'top hits'.

top hits

GWAS hits and their annotation

The pkgdown document on ieugwasr is highly informative. We work through some examples here to verify functionality of our container.

First, by using the searchable table above, we can obtain the study id for "Asthma": ieu-a-44. We can then obtain the top "hits" for this trait (a p-value threshold can be specified; default is 5e-8).

asthtop = tophits("ieu-a-44")
asthtop

Basic annotation on these hits can be obtained using either positions or rsids.

asthv = variants_rsid(asthtop$rsid)
datatable(asthv)

PheWAS/eQTL lookup

Let's follow up our hit in GSDMB with a 'PheWAS'. The p-value threshold is indicated in documentation to be 0.00001.

gdp = phewas("rs2290400")
datatable(gdp)

A number of the traits are genes, and the findings come from eQTL studies. Let's see where these genes are located.

eqtab = gdp[grep("^eqtl", gdp$id),]
eqtab = bind_sym(eqtab)
table(eqtab$gchrom)

So there are 'trans' hits for our GSDMB-resident asthma SNP.

Extending the available facilities

Ontological tagging of trait terms

Introduction to Disease Ontology

An ontology is a structured vocabulary, and relationships among terms specified in a directed acyclic graph. Disease Ontology(DO) is an example related to systematic naming of human diseases.

Here is small illustration of the idea.

library(ontoProc)
DO = getDiseaseOnto() # snapshot
dem = c("DOID:4", "DOID:7", "DOID:74")
kids = lapply(dem, function(x) DO$children[x][[1]])
onto_plot2(DO, tail(unlist(kids),12))

The formalization of relationships among diseases, as encoded in DO, may be useful in investigations of etiology. The use of formal terms and fixed-length tags will be useful for combining information from independently conducted investigations.

Using Disease Ontology to tag MRC GWAS trait terms

"Traits" studied in the MRC GWAS ecosystem are only infrequently named using formal ontological terms. Simple code to match trait terms with DO terms is

traits = gwidf_2021_01_30$trait
indo = intersect(tolower(traits), tolower(DO$name))
inds = match(indo, tolower(DO$name))
mapped_traits = DO$name[inds]
indo[1:4] # in MRC GWAS
mapped_traits[1:4] # in DO
length(mapped_traits)

It will be possible to map even more traits once more careful textual analysis is performed on the traits vector. For example, one study uses Diagnoses - main ICD10: K50 Crohn's disease [regional enteritis] to name its trait. The code given above will not find this.

Manhattan plots

The associations function of ieugwasr produces a filtered set of hits for a given region in a given study. The manhattanPlot function of gwaslake simplifies this to some extent, and provides a visualization. One can specify a gene symbol and a flanking radius in base pairs, but one still needs to know the ieugwasr id of the study of interest.

manhattanPlot(studyID="ubm-a-524", symbol = "BCAN", radius=500000)

As the tagging of phenotypes improves, this facility will evolve to permit more flexible selection of studies and loci to retrieve and visualize.



vjcitn/gwaslake documentation built on Aug. 9, 2022, 5:45 p.m.