Psychiatric disorders have a great impact on morbidity and mortality [@Murray2013][@Whiteford2013]. According to the World Health Organization (WHO), one of every four people will suffer mental or neurological disorders at some point in their lives [@WHO2013]. It has been suggested that most psychiatric disorders display a strong genetic component [@Issler2015][@Levinson2014][@Ripke2014]. During the last years there has been a growing research in psychiatric disorders' genetics [@Sullivan2012], and therefore the number of publications that focus on psychiatric disorders have increased steadily (Figure \@ref(fig:pubmed1)).
library(ggplot2) data.file <- system.file( paste0("extdata", .Platform$file.sep, "psychiatricDisordersPubmed.csv"), package="psygenet2r" ) pmid <- read.delim(data.file, header=TRUE, sep=",") pmid <- pmid[pmid$year<2017 & pmid$year>1950,] pmid$year <- factor(pmid$year) labels <- as.integer(seq(1950, 2016, by=5)) p <- ggplot(pmid, aes ( x = year, y = count ) ) + geom_bar ( stat = "identity", fill = "grey" ) + labs ( title = "Number of publications for psychiatric disorders in PubMed" , x = "year", y = "# of pmids") + theme_classic( ) + scale_x_discrete(breaks=labels, labels=as.character(labels))+ theme( plot.margin = grid::unit ( x = c ( 5, 15, 5, 15 ), units = "mm" ), axis.line = element_line ( size = 0.7, color = "black" ), text = element_text ( size = 14 ) , axis.text.x = element_text ( angle = 45, size = 11, hjust = 1 ) ) p
However, there is still limited understanding on the cellular and molecular mechanisms leading to psychiatric diseases, which has limited the application of this wealth of data in the clinical practice. This situation also applies to psychiatric comorbidities. Some of the factors that explain the current situation is the heterogeneity of the information about psychiatric disorders and its fragmentation into knowledge silos, and the lack of resources that collect these wealth of data, integrate them, and supply the information in an intuitive, open access manner to the community. PsyGeNET [@Gutierrez-Sacristan2015] has been developed to fill this gap and
psygenet2r has been developed to facilitate statistical analysis of PsyGeNET data, allowing its integration with other packages available in R to develop data analysis workflows.
psygenet2r package allows to retrieve the genes associated to psychiatric diseases, or explore the association between a disease of interest and PsyGeNET diseases based on shared genes. In addition,
psygenet2r allows the annotation of genes with psychiatric diseases based on expert-curated information. This functionality can be of interest to interpret the results of GWAS or Whole Exome Sequencing studies, in which a list of gene variants is obtained and there is a need to prioritize them based on their functional and clinical relevance. In this context, it would be of interest to know if there is information on their implication in psychiatric diseases. In this Case study we will describe how we can analyze the genes identified in a GWAS study in the context of psychiatric diseases using
psygenet2r. For this purpose, we will use as an example the data obtained from a GWAS study on bipolar disorder published by [@mccarthy2014whole]. In this study, the authors analyzed the brain expression of 58 genes, previously identified in a GWAS of bipolar disorder [@psychiatric2011large], and correlated this information with structural MRI studies to identify brain regions that are abnormal in bipolar disorder. We will use this list of 58 genes from the bipolar disorder study to show the functionality of
The goal of the study is to analyze a set of genes discovered by GWAS in the context of PsyGeNET. More specifically, we want to answer the following questions:
PsyGeNET, a knowledge resource for the exploratory analysis of psychiatric diseases and their genes, contains information on eight psychiatric disorders: depression, bipolar disorder, schizophrenia, alcohol, cocaine and cannabis use disorders, substance-induced depressive disorder and psychoses. PsyGeNET database has been developed by automatic extraction of information from the literature using the text mining tool BeFree [@bravo2015extraction] (http://ibi.imim.es/befree/), followed by curation by experts in the domain. The current version of PsyGeNET (version 2.0) contains 3,771 associations between 1,549 genes and 117 psychiatric disease concepts.
psygenet2r package contains functions to query and analyze PsyGeNET data, and to integrate with other information, as exemplified in this case study.
psygenet2r package is provided through Bioconductor [@Gentleman2004]. To install
psygenet2r the user must type in the two following commands in R session:
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install( "psyGeNET2R" )
library( psygenet2r )
The first step that has to be done before doing any analysis is saving the genes in an R vector. For this case-study the 58 genes obtained from McCarthy et al. [@mccarthy2014whole] are saved into a vector called
Genes can be identified using the NCBI gene identifier or the Official Gene Symbol from HUGO.
genesOfInterest <- c("ADCY2", "AKAP13", "ANK3", "ANKS1A", "ATP6V1G3", "ATXN1", "C11orf80", "C15orf53", "CACNA1C", "CACNA1D", "CACNB3", "CROT", "DLG2", "DNAJB4", "DUSP22", "FAM155A", "FLJ16124", "FSTL5", "GATA5", "GNA14", "GPR81", "HHAT", "IFI44", "ITIH3", "KDM5B", "KIF1A", "LOC150197", "MAD1L1", "MAPK10", "MCM9", "MSI2", "NFIX", "NGF", "NPAS3", "ODZ4", "PAPOLG", "PAX1", "PBRM1", "PTPRE", "PTPRT", "RASIP1", "RIMBP2", "RXRG", "SGCG", "SH3PXD2A", "SIPA1L2", "SNX8", "SPERT", "STK39", "SYNE1", "THSD7A", "TNR", "TRANK1", "TRIM9", "UBE2E3", "UBR1", "ZMIZ1", "ZNF274")
In order to know how many of the genes of interest are present in PsyGeNET,
psygenetGeneList function is used. This function requires as input the genes' vector and the selected database. For this analysis
"ALL" database are selected.
m1 <- psygenetGene( gene = genesOfInterest, database = "ALL", verbose = FALSE, warnings = FALSE ) m1
The output is a
DataGeNET.Psy object. It contains all the information about the different diseases associated with the genes of interest retrieved from PsyGeNET. By looking at the
DataGeNET.Psy object, it can be observed that, according to PsyGeNET and by querying in
"ALL"" databases, 16 of the initial genes are found in PsyGeNET. These genes appear associated with 15 different disorders, involving a total of 48 gene-disease associations (GDAs).
In order to visualize the 48 GDAs between the 16 genes found in PsyGeNET and the 15 different disorders,
psygenet2r provides several options. One of them is the GDA network, which can be obtained by applying the
plot function to the
DataGeNET.Psy object (
m1), obtained from
psygenetGene function (section 4.1). In the GDA network, blue nodes represent diseases and yellow nodes represent genes.
plot( m1 )
As shown in Figure \@ref(fig:gene-disease), most of the genes (10) are associated to bipolar disorder (umls:C0005586), in agreement with McCarthy et al. [@mccarthy2014whole], but some of these genes are also associated to other disorders related to alcohol UD, depression and schizophrenia.
In PsyGeNET it is important to keep track of both “positive” and the “negative” findings, and let the user make their own judgements based on the available evidence. We can visualize the association type (Association, No Association,Both) using heat-maps.
In the example shown in the next figure (Figure \@ref(fig:gene-psy)), the gene-disease associations related to alcohol UD and depresison are all of type "Association". On the other hand, for bipolar disorder and schizophrenia there are some GDAs that have both types of associations. Finally, in the schizophrenia category it can be seen that there is one GDA with "no association" type.
The barplot can be obtained using the
geneAttrPlot function and by setting the
type argument to
geneAttrPlot( m1, type = "evidence index" )
psygenet2r package can be used to analyze gene attributes such as the panther class. The PANTHER Protein Class Ontology includes commonly used classes of protein functions. The Panther class can be analyzed using the function
pantherGraphic function requires as input the list of genes (
genesOfInterest vector) and the database (for instance
"ALL"). The output of
pantherGraphic function is a bar-plot with the different Panther classes in the Y-axis and the percentage of genes in the X-axis, grouped by PsyGeNET psychiatric disorders.
pantherGraphic( genesOfInterest, "ALL")
The bar-plot shown in Figure \@ref(fig:panther) is obtained from the gene-list of interest. All the genes in the list that are associated with the psychiatric disease class of alcohol use disorders are signaling molecules. On the other hand, it can be observed that those genes associated with bipolar disorder, depression and schizophrenia are in a variety of Panther classes.
In PsyGeNET, each GDA is ranked with the PsyGeNET evidence index, that reflects the association type for each GDA. We can use
psygenet2r package to visualize the level of evidence in a heatmap.
In order to obtain the heatmap, the
plot function can be applied to the
DataGeNET.Psy object (
m1). The argument type must to be set to
As a result a heat-map is obtained with genes in the X axis and disorders in the Y axis. Heatmap cells will be coloured in green, yellow or red according to the evidence index value. Green color represents those GDAs where all the evidences reviewed by the experts support the existence of an association between the gene and the disease (
EI = 1); it will be yellow when there is contradictory evidence for the GDA (some publications support the association while others publications do not support it,
1 > EI > 0), and it will be red when all the evidences reviewed by the experts report that there is no association between the gene and the disease (
EI = 0).
plot( m1, type="GDA heatmap")
Figure \@ref(fig:gene-disease-2) shows that from the total 48 GDAs, only 4 of them have an evidence index different from 1. For example, all the publications supporting C15orf53-schizophrenia association report that there is no association; for the other 3 GDAs (C15orf53-bipolar disorder, DLG2-schizophrenia and SYNE1-bipolar disorder) contradictory evidences have been found.
In addition to the PsyGeNET evidence index, we can also inspect the number of publications that support a GDA.
psygenet2r allows the visualization of this information in a bar-plot, using the
plot function with the
disorder argument to indicate the disease of interest, and the
type argument set to
"publications ". Figure \@ref(fig:pubmed2) shows an example, with the genes in the X-axis and the number of PMIDs in the y-axis.
plot( m1, name="bipolar disorder", type="publications")
The results show that the CACNAC1 gene is associated with bipolar disorder in more than 50 publications, followed by ANK3 gene with around 40 publications. The rest of genes have been associated with bipolar disorder in less than 10 publications.
psygenet2r package also allows the extraction of the sentences and the pmids for each one of the GDAs for a particular disease. Two functions are required,
extractSentences. So, first,
psygenetGeneSentences function is applied to
"ALL" databases. A
DataGeNET.Psy object is obtained.
m2 <- psygenetGeneSentences( geneList = genesOfInterest, database = "ALL" ) m2
extractSentences function is applied to the previous
DataGeNET.Psy object and
disorder argument is set to the disorder of interest, in this case,
"bipolar disorder". Notice that if the disorder name is used, it must be written as it appears in PsyGeNET, otherwise results will not be found. The result is a data frame that contains the gene symbol, gene identifier, disease code, disease name, original db, the pmid, the annotation type and the sentence. As an example the first pmids are shown.
sentences <- extractSentences( m2, disorder = "bipolar disorder" ) head(sentences$PUBMED_ID)
An interesting question is to know which diseases are similar to a target disease based on shared genes. Since PsyGeNET database contains information on genes associated to psychiatric diseases we can use it to estimate disease similarity. The Jaccard Index is a statistic used for comparing the similarity of two sets. In our case, these sets are the genes associated to each one of the target diseases. In the
psygenet2r package, the Jaccard Index is calculated by using the function
jacacardEstimation allows us to calculate the Jaccard Index using both PsyGeNET's data (like disease names or CUIs) and external information as vectors of genes. Moreover, this function calculates p-value for the index to assess its statistical significance.
The strategy to calculate the Jaccard Index and its p-value as follows:
nboot + 1).
Let's calculate the Jaccard Index for the genes of interest and bipolar disorder:
xx <- jaccardEstimation( genesOfInterest, "bipolar disorder", database = "ALL", nboot = 500 ) xx
The result shows the number of iterations used to calculate the p-value and the type of input, in this case a list of genes and a disease. The Jaccard Index and the p-value can be extracted using the function
extract( xx )
Now we have seen the Jaccard Index and its p-value, between our genes of interest and bipolar disorder (JI: 0.018; pval: 0). The function
jaccardEstimation also allows to calculate the Jaccard Index of our input, the list of genes, and all the diseases in PsyGeNET.
xx <- jaccardEstimation( genesOfInterest, database = "ALL", nboot = 500 )
The result from the Jaccard Index estimation can be visualized using the function
plot. The result is a bar-plot (Figure \@ref(fig:jacc)) where the p-value of each comparison between our genes and PsyGeNET's disease is shown. A
cutOff argument can be added in order to visualize only those diseases with an statistically significant p-value:
plot( xx )
The similarity between diseases according to shared genes can be visualized using a barplot.
psygenet2r package allows to visualize how many of our genes of interest are associated to each psychiatric disorder present in PsyGeNET and how many of them are exclusively associated to a particular psychiatric disorder.
In order see the similarity between our genes of interest and PsyGeNET's psychiatric disorder, the
type argument of
geneAttrPlot function is set to
disease category. This will allow to see how many of the 16 genes, that appear in PsyGeNET, are associated with each psychiatric disorder.
geneAttrPlot( m1, type = "disease category" )
Figure \@ref(fig:bpGenes) clearly shows that none of our genes of interest are associated to the three disorders. There are 2 exclusively associated to bipolar disorder, 2 in the case of depression and 3 in schizophrenia. The rest of the genes appears in more than one psychiatric disorder.
In order see which are the genes that are associated only to one psychiatric category, the
type argument of
geneAttrPlot function can be set to
gene. This will allow to see how many of the CUIs and categories are associated to each one of the 16 genes.
geneAttrPlot( m1, type = "gene" )
Figure \@ref(fig:bpDis) clearly shows that these genes are: ADCY2, ATXN1, CACNA1C, HHAT, MAD1L1, NFIX and SH3PXD2A. Yellow bar for these genes show that they are only associated with one psychiatric disorder.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.