knitr::opts_chunk$set( warning = FALSE, collapse = TRUE, comment = "#>",fig.width=12, fig.height=9, message=FALSE, tidy=TRUE, dpi=75)
iPheGWAS (intelligent -PheGWAS) module is developed to bring intelligence into PheGWAS by incorporating a new heuristic approach developed by our team to order traits based on its genetic correlation quickly and efficiently. As a result, the iPheGWAS module is integrated seamlessly into the PheGWAS module. We also improved the previous PheGWAS codebase for faster landscape visualization.
This package is packaged with datasets used in PheGWAS and some datasets to demonstrate iPheGWAS and from the paper. If you are looking for the entire data used for studies 1 and 2 in iPheGWAS, please download it from here. Checkout the individual datasets here.
Here are the datasets that are available from iPheGWAS study2 within the package.
Loading iphegwas package;
library(iphegwas)
Your GWAS summarystats file/dataframe should have columns named in this way.
CHR BP rsid A1 A2 beta se P.value
You can fun iPheGWAS in 2 modes, - Mode 1: Folder path to the summary statistics file. - Mode 2: Passing the dataframe names available in your environment to iphegwas module as vector of dataframe names.
## Giving absolute file path ## pathname <- "<path to your folder>" pathname <- system.file("extdata", "samplesummary", package = "iphegwas") iphegwas(pathname = pathname,dentogram = TRUE)
We can use iphegwas
module in iPheGWAS package to determine the genetic similarity between the traits. If we pass the argument dentogram = TRUE
, it will give a dentogram that is ordered based on the genetic similarity and give further insights to it's genetic architecture (for more details checkout paper).
iphegwas(pathname = pathname,dentogram = TRUE)
Passing it without dentogram = TRUE
will return ordered traits. This is useful to pass to the PheGWAS module to order similar traits in the PheGWAS landscape.
iphegwas(pathname = pathname)
head(ibd,3) head(bmi,3) head(Wasisthipratio,3) head(CrohnsDisease,3) head(UlcerativeColitis,3) ## Bringing all package data to the environment ibd <- ibd bmi <- bmi Wasisthipratio <- Wasisthipratio CrohnsDisease <- CrohnsDisease UlcerativeColitis <- UlcerativeColitis
Generating dentrograms
phenos <- c("ibd","bmi","CrohnsDisease","UlcerativeColitis","Wasisthipratio") iphegwas(phenos,dentogram = TRUE)
iphegwas(phenos)
The ordering of the PheGWAS landscape is based on the order in which the user passed the phenotypes. So the order here is meaningless, and when dealing with many phenotypes, it is often the case that we would like to group the traits based on their genetic similarity for further investigation on genetic architecture and the correlation drivers from the landscape.
yy <- fastprocessphegwas(phenos)
Once the processing is done, pass the dataframe that you got from fastprocessphegwas
to landscapefast
to see the landscape; Here, the landscape orders are in the order that we are passing the phenos
.
print(phenos)
landscapefast(yy,sliceval = 7,phenos =phenos)
If you want to order the traits in the landscape based on genetic similarity, then you pass the order that you get from the iphegwas module.
landscapefast(yy,sliceval = 7,phenos = iphegwas(phenos))
Often our users use the iphegwas modules to get insights on all the traits quickly, so they can pick the traits they are interested in and pass those traits to the LDSC module for its genetic correlation values. To develop this R module ( ldscmod ) , we used LDSC python modules - thanks to the LDSC team.
You can use the ldscmod
to calculate;
correlationmatrix
in your global environment that you can use later. If correlationmatrix
exist in your global environment then ldscmod
won't recalculate the rg, but you can use ldscmod with dentogram = TRUE
and plot = TRUE
.You need to pass the pathname
and ldscpath
. The path name is the absolute path to the folder containing the summary statistics for which you want to find the genetic correlation. You must have only summary statistics in this folder, or else you will get an error. All the additional files that LDSC generates will also be put into the location pathname
for later reference if you are interested in looking at those files for additional metrics that LDSC provides.
pathname <- system.file("extdata", "samplesummary", package = "iphegwas") ldscpath <- system.file("extdata", "ldsc", package = "iphegwas")
Using ldscmod to get the genetic correlation plot (using LDSC).
ldscmod(pathname,ldscpath,plot = TRUE)
Using ldscmod to examine the dendrograms (using LDSC).
ldscmod(pathname,ldscpath,dentogram = TRUE)
If you want to order the traits in the landscape based on the genetic correlation from LDSC, then you pass the order what you get from the ldsc module.
landscapefast(yy,sliceval = 7,phenos = ldscmod(pathname,ldscpath))
In addition to the heuristic approach that we developed, all the functionalities outlined in the PheGWAS are also available in iphegwas package. Considering performance in mind, the entire codebase is rewritten, and you will notice that the iphegwas package is faster than the PheGWAS package. Adding here the code from the PheGWAS vignette.
Following processed summary data are from the lipid consortium:
head(hdl,3) head(ldl,3) head(trig,3) head(tchol,3) ## I am changing the name of the dataframe to something meaningful, as the name of the dataframe will be used as phenotype names in the landscape. This also bring all package data to the environment. HDL <- hdl LDL <- ldl TRIGS <- trig TOTALCHOLESTROL <- tchol
Note: The gene column is optional. There is an option to map genes to rsid if you want this, please set genemap = TRUE
(By default it is set to FALSE
). If TRUE
it will take some time as it is using Gene BioMart Module to map genes internally.
The dataframe’s are passed to processphegwas function as a list of dataframe’s.
phenos <- c("HDL", "LDL", "TRIGS", "TOTALCHOLESTROL") y <- fastprocessphegwas(phenos)
3D landscape visualization of all the phenotypes across the base pair positions(above a threshold of -log10 (p) 6)
landscapefast(y,sliceval = 10,phenos =phenos)
3D landscape visualization of chromosome number 19 (above a threshold of -log10 (p) 10)
landscapefast(y,sliceval = 7.5,chromosome = 19,phenos =phenos)
3D landscape visualization of chromosome number 19, gene view active (above a threshold of -log10 (p) 10)
landscapefast(y,sliceval = 7.5,chromosome = 19, geneview = TRUE,phenos =phenos)
3D visualization with LD block (for european population) passing externally, parameter to pass LD and also calculate the mutualLD block
landscapefast(y, sliceval = 30, chromosome = 19,calculateLD= TRUE,mutualLD = TRUE,phenos =phenos)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.