We are indebted to the MRC Integrative Epidemiology Unit for producing the infrastructure leading to the following schematic:
The packages identified in the ecosystem are mostly accessed through github repositories. The interdependencies and idiosyncrasies of using Remotes in DESCRIPTION made it difficult to produce a container image, so I have forked several of the packages identified in the schematic, to tweak the DESCRIPTIONs. I have filed some pull requests but have not heard back. I will try to keep the DESCRIPTION for this workshop package current, minimizing use of forks, and hope individual package maintainers will get in touch.
We'll start by listing the functions available in the ieugwasr
package.
suppressPackageStartupMessages({ library(gwaslake) library(plotly) library(ieugwasr) library(dplyr) library(tibble) library(DT) library(ontoProc) }) ls("package:ieugwasr")
The pkgdown site for ieugwasr
includes a reference listing, visible here.
The gwasinfo command makes an API call to the MRC ecosystem.
gwi = gwasinfo()
However, we don't want to do this too often, so we've made a snapshot and keep it in the package.
gwi = gwidf_2021_01_30 as_tibble(gwi)
That's a huge table. We can get an overview by tabulating the 'batches' into which studies have been organized.
gb = batches() datatable(gb)
A comprehensive view of phenotypes is challenging to assemble.
The category
field has a relevant value, but we'll want to explore
further.
table(gwi$category) distab = gwi %>% filter(category=="Disease") distab %>% group_by(trait) %>% summarise(n=n()) %>% arrange(desc(n))
A searchable table will be useful.
datatable(as.data.frame(distab))
The survey_gwas
function plots number of cases versus number
of controls for all studies in which the phrases
argument
matches the trait
field, using grep
.
With plotly,
we get an interactive graphic with additional data displayed on hover.
ggplotly(survey_gwas(phrases=c("asthma", "asthmatic")))
The survey_app
function returns a display like the above, and
an interactive table of 'top hits'.
The pkgdown document
on ieugwasr
is highly informative. We work through some examples here
to verify functionality of our container.
First, by using the searchable table above, we can obtain the study
id for "Asthma": ieu-a-44
. We can then obtain the top "hits" for
this trait (a p-value threshold can be specified; default is 5e-8).
asthtop = tophits("ieu-a-44") asthtop
Basic annotation on these hits can be obtained using either positions or rsids.
asthv = variants_rsid(asthtop$rsid) datatable(asthv)
Let's follow up our hit in GSDMB with a 'PheWAS'. The p-value threshold is indicated in documentation to be 0.00001.
gdp = phewas("rs2290400") datatable(gdp)
A number of the traits are genes, and the findings come from eQTL studies. Let's see where these genes are located.
eqtab = gdp[grep("^eqtl", gdp$id),] eqtab = bind_sym(eqtab) table(eqtab$gchrom)
So there are 'trans' hits for our GSDMB-resident asthma SNP.
An ontology is a structured vocabulary, and relationships among terms specified in a directed acyclic graph. Disease Ontology(DO) is an example related to systematic naming of human diseases.
Here is small illustration of the idea.
library(ontoProc) DO = getDiseaseOnto() # snapshot dem = c("DOID:4", "DOID:7", "DOID:74") kids = lapply(dem, function(x) DO$children[x][[1]]) onto_plot2(DO, tail(unlist(kids),12))
The formalization of relationships among diseases, as encoded in DO, may be useful in investigations of etiology. The use of formal terms and fixed-length tags will be useful for combining information from independently conducted investigations.
"Traits" studied in the MRC GWAS ecosystem are only infrequently named using formal ontological terms. Simple code to match trait terms with DO terms is
traits = gwidf_2021_01_30$trait indo = intersect(tolower(traits), tolower(DO$name)) inds = match(indo, tolower(DO$name)) mapped_traits = DO$name[inds] indo[1:4] # in MRC GWAS mapped_traits[1:4] # in DO length(mapped_traits)
It will be possible to map even more traits once more careful textual
analysis is performed on the traits
vector. For example,
one study uses Diagnoses - main ICD10: K50 Crohn's disease [regional enteritis]
to name its trait. The code given above will not find this.
The associations
function of ieugwasr
produces a filtered set
of hits for a given region in a given study. The manhattanPlot
function
of gwaslake simplifies this to some extent, and provides a visualization.
One can specify a gene symbol and a flanking radius in base pairs,
but one still needs to know the ieugwasr id
of the study of interest.
manhattanPlot(studyID="ubm-a-524", symbol = "BCAN", radius=500000)
As the tagging of phenotypes improves, this facility will evolve to permit more flexible selection of studies and loci to retrieve and visualize.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.