knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette provides an introduction to the genetic analysis using the 'palmer' package. R package
'palmer' implements PALMER (constrained biclustering algorithm to improve Pathway Annotation based on the Biomedical Literature miming), a constrained biclustering approach that allows to identify indirect relationships among genes based on the text mining of biomedical literature, which allows researchers to utilize prior biological knowledge to guide identification of gene-gene associations.
The package can be loaded with the command:
library("palmer")
Users can find the most up-to-date versions of 'palmer' package in our GitHub webpage https://dongjunchung.github.io/palmer/
In this vignette, we use a example data set which are composed of binary matrix to be clusterd and two gene sets (pathways) as constraints. Two gene sets are composed of mTOR signaling pathway (52 genes) and NOTCH signaling pathway (47 genes). The binary matrix is size of 99 genes $\times$ 47 GO terms matrix, 1 value indicates that a gene is associated with a GO terms and 0 otherwise. First pathways is set of 26 genes and second one is set of 23 genes, which are randomly selected from the each gene list. Thus, assumption of data is that both 26 genes and 23 genes are known as gene pathwy and remaining 50 genes are clusterd to each pathway.
The data sets can be loaded with the command:
data(kegg_data) data(kegg_path)
dim(kegg_data) kegg_data[1:6,1:6]
kegg_path
library(pheatmap) pheatmap(kegg_data,color=c("white","black"))
We are now ready to fit a PALMER using binary matrix and pathway information described above (kegg_data and kegg_path). PAMLER is essentially a biclustering algorithm, where conditional mixture models for genes and GO terms are fitted iteratively and confidence of these identifications is estimated using a block-specific bootstrap procedure. R package palmer provides convience function 'palmer' for fitting PALMER and just need three arguments except for dataset and pathway, which are number of gene cluster, number of GO term cluster and number of bootstrapping for estimation of uncertainy of gene clustering. In this vignette, we set 2 for gene cluster number (K=2) under data assumption and set 3 for GO term cluster number (L=3). For the bootstraps, we set 100 (B=100, default is 1000).
We fit the PALMER with the command:
fit.palmer <- palmer(X=kegg_data,path=kegg_path,K=2,L=3,B=100)
The following command prints out a summary of PALMER fit, including data summary and model setting.
fit.palmer
Gene cluster and GO term cluster information and probability corresponding to each cluster can be extracted using method `predict'.
predict(fit.palmer)
A method 'plot' shows heatmap with cluster information.
plot(fit.palmer)
In order to further facilitate users’ convenience to access and obtain the literature mining data for PALMER analysis, we developed the web interface LitSelect, LitSelect uses GAIL’s (http://chunglab.io/GAIL) graph database to access low-level literature mining data (p-values) and then perform candidate gene and GO term selection. LitSelect is accessible at http://chunglab.io/LitSelect. The interface contains a form with the following input fields for the user to provide: 1) two gene lists, 2.) a ratio of candidate genes to select, 3.) a ratio of candidate GO terms to select, and 4.) a binarization cutoff. To ensure unique mapping of gene IDs, the interface requires HGNC IDs. The interface contains a link to an ID mapper we previously developed in case the user has some other name for the gene (e.g. gene symbol). LitSelect then performs the selection of candidate gene and GO terms based on the user-provided parameters and pathway information and then we can download the literature mining data for PALMER.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.