Gene pair enrichment analysis (GPEA)

Share:

Description

When a network G contains n interactions, of which k interactions are among genes from the given gene set S, then a p-value for the enrichment of gene pairs of this gene set S can be calculated based on a e.g., one-sided Fisher's exact test. For p genes there is a total of N=p(p-1)/2 different gene pairs (clique graph) with the assumption that all genes within a gene set are associated to each other. If there are pS genes for a particular gene set (S) then the total number of gene pairs for this gene set is mS=pS(pS-1)/2.

Usage

1
2
3
4
5
 


gpea(gnet, genesets, verbose = TRUE, cmax = 1000, cmin = 3,
                 adj = "bonferroni")

Arguments

gnet

igraph object (e.g., inferred from bc3net) of a given network where the gene identifiers [V(net)$names] correspond to the provided gene identifiers in the reference gene sets.

genesets

A named list object of a collection of gene sets. The identifiers used for the candidate and reference genes need to match the identifier types used for the gene sets. An example list of gene sets is given in data(exgensets) showing an example list object of gene sets from pathways with gene symbols.

$'Reactome:REACT_115566:Cell Cycle' [1] "APITD1" "TAOK1" "CDC23" [4] "RB1" "PRKCA" "HIST1H4J" [7] "MCM10" "PPP1CC" "NUP153" ...

$'Reactome:REACT_152:Cell Cycle, Mitotic' [1] "APITD1" "TAOK1" "CDKN2C" [4] "RB1" "PRKCA" "MCM10" [7] "HIST1H2BH" "NUP153" "TUBGCP3" [10] "APEX1" "RPA2" "PRKACA" ...

verbose

The default value is <FALSE>. If this option is set <TRUE> the number and name of the gene sets during their processing is reported.

cmax

All provided genesets with more than cmax genes will be excluded from the analysis (default cmax=1000).

cmin

All provided genesets with less than cmin genes will be excluded from the analysis (default cmin>=3).

adj

The default value is <fdr> (False discovery rate using the Benjamini-Hochberg approach). Multiple testing correction based on the function stats::p.adjust() with available options for "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" and "none"

Details

The enrichment analysis is based on a one-sided Fisher's exact test.

Value

The function returns a data.frame object with the columns

"TermID" the given name of a gene set i from the named gene set collection list object S. "edges" the number of connected gene pairs present a given geneset "genes" the number of candidate genes present in the gene set i "all" the number of all genes present in the gene set i "pval" the nominal p-value from a one-sided fisher's exact test "padj" the adjusted p-value to consider for multiple testing

Author(s)

Ricardo de Matos Simoes

References

Inference and Analysis of Gene Regulatory Networks in R: Applications in Biology, Medicine, and Chemistry, DOI: 10.1002/9783527694365.ch10 In book: Computational Network Analysis with R, 2016, pp.289-306

Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data, BMC Syst Biol. 2015; 9: 21.

See Also

See Also as enrichment

Examples

1
2
3
4
data(exanet)
data(exgensets) ## example gene sets from the CPDB database (http://www.consensuspathdb.org)

res = gpea(exanet, exgensets, cmax=1000, cmin=2)