sigora-package: Signature Overrepresentation Analysis

Description Details Author(s) References See Also Examples

Description

This package implements the pathway analysis method SIGORA. For an in depth description of the method, please see our manuscript in PeerJ. In short: a GPS (gene pair signature) is a (weighted) pair of genes that as a combination occurs only in a single pathway within a pathway repository. A query list is a vector containing a gene list of interest (e.g. genes that are differentially expressed in a particular condition). A present GPS is a GPS for which both components are in the query list. SIGORA identifies relevant pathways based on the over-representation analysis of their (present) GPS.

Details

Getting started

To install from CRAN:
install.packages('sigora')
As an alternative, you can download the tarball and install from the local file:
install.packages("sigora_3.0.tar.gz",type="source",repos = NULL)
To load the library:
library("sigora")

Motivation –A thought experiment

Imagine you randomly selected 3 KEGG pathways, and then randomly selected a total of 50 genes from all genes that that are associated with any of these pathways. Using traditional methods (hypergeometric test using individual genes), how many pathways would you estimate to show up as statistically overrepresented in this "query list" of 50 genes? Let us do this experiment! Everything related to human KEGG Pathways can be found in kegH. A function to randomly select n genes from m reandomly selected pathways is genesFromRandomPathways. The traditional Overrepresentation Analysis (which is the basis for many popular tools) is available through ora. Putting these together:

data(kegH)
a1<-genesFromRandomPathways(seed=12345,kegH,3,50)
## originally selected pathways:
a1[["selectedPathways"]]
## what are the genes a1[["genes"]]
## Traditional ora identifies dozens of statistically significant pathways!
ora(a1[["genes"]],kegH)
## Now let us try sigora with the same input:
sigoraRes <- sigora(GPSrepo =kegH, queryList = a1[["genes"]],level = 4)
## Again, the three originally selected pathways were:
a1[["selectedPathways"]]

You might want to rerun the above few lines of code with different values for seed and convince yourself that there indeed is a need for a new way of pathway analysis.

Available Pathway-GPS repositories in SIGORA

The current version of the package comes with precomputed GPS-repositories for KEGG human and mouse (kegH and kegM respectively), as well as for Reactome human and mouse (reaH and reaM respectively). The package provides a function for creating GPS-repositories from user's own gene-function repository of choice (example Gene Ontology Biological Processes). The following section describes this process of creating one's own GPS-repositories using the PCI-NCI pathways from National Cancer Institute as an example.

Creating a GPS repository

You can create your own GPS repositories using the makeGPS() function. There are no particular requirements on the format of your source repository, except: it should be provided either a tab delimited file or a dataframe with three columns in the following order:
PathwayID, PathwayName, Gene.

data(nciTable)
## what does the input look like?
head(nciTable)
## create a SigObject. use the saveFile parameter for future reuse.
nciH<-makeGPS(pathwayTable=nciTable, saveFile='nciH.rda')
ils<-grep("^IL",idmap[,"Symbol"],value=TRUE)
ilnci<-sigora(queryList=ils,GPSrepo=nciH,level=3)

Analysing your data

To preform Signature Overrepresentation Analysis, use the function sigora. For traditional Overrepresentation Analysis, use the function ora.

Exporting the results

Simply provide a file name to the saveFile parameter of sigora, i.e. (for the above experiment):
sigRes<- sigora(kegH,queryList= a1$genes,level= 2, saveFile="myResultsKEGG.csv")
You will notice that the file also contains the list of the relevant genes from the query list in each pathway. The genes are listed as human readable gene symbols and sorted by their contribution to the statistical significance of the pathway.

Gene identifier mapping

Mappings between ENSEMBL-IDS,ENTREZ-IDS and Gene-Symbols are performed automatically. You can, for instance, create a GPS-repository using ENSEMBL-IDs and perform Signature Overrepresentation Analysis using this repository on a list of ENTREZ-IDs.

Author(s)

Amir B.K. Foroushani, Fiona S.L. Brinkman, David J. Lynn

Maintainer: Amir Foroushani <sigora.dev@gmail.com>

References

Foroushani AB, Brinkman FS and Lynn DJ (2013).“Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures.”PeerJ, 1

See Also

sigora, makeGPS, ora

Examples

1
2
3
4
5
6
7
8
barplot(table(kegH$L1$degs),col="red",
main="distribution of number of functions per gene in KEGG human pathways.",
ylab="frequency",xlab="number of functions per gene")
## creating your own GPS repository
nciH<-makeGPS(pathwayTable=nciTable)
ils<-grep("^IL",idmap[,"Symbol"],value=TRUE)
## signature overrepresentation analysis:
sigRes.ilnci<-sigora(queryList=ils,GPSrepo=nciH,level=3)

Example output

Garbage collection 15 = 3+3+9 (level 2) ... 
15.0 Mbytes of cons cells used (47%)
23.2 Mbytes of vectors used (20%)
Time difference of 1.30783 secs
[1] "Mapped identifiers from Symbol  to  Ensembl.Gene.ID ..."
      pathwy.id                    description   pvalues Bonferroni successes
1   il23pathway IL23-mediated signaling events 5.494e-64  1.049e-61     36.27
2   il27pathway IL27-mediated signaling events 3.164e-34  6.043e-32     18.14
3 il12_2pathway IL12-mediated signaling events 3.188e-12  6.089e-10     13.20
4    il1pathway  IL1-mediated signaling events 1.115e-09  2.130e-07      8.42
5  il4_2pathway  IL4-mediated signaling events 1.070e-05  2.044e-03      9.03
  PathwaySize        N sample.size
1      172.95 46257.95       93.08
2       65.51 46257.95       93.08
3      420.16 46257.95       93.08
4      156.05 46257.95       93.08
5      687.89 46257.95       93.08

sigora documentation built on Aug. 24, 2019, 1:04 a.m.