gsEasy

suppressPackageStartupMessages(library(gsEasy))
set.seed(1)

Calculate p-values for enrichment of set

gsEasy has a function gset for calculating p-values of enrichment for sets (of genes) in ranked/scored lists (of genes) by permutation (see 'Gene Set Enrichment Analysis' described by Subramanian et al, 2005). The arguments of gset are named as in the paper:

Say we had a set of 5 genes which appeared at the top five ranks out of 1000 (i.e. highly enriched at the high ranks!). We could then calculate an enrichment p-value using the command:

gset(S=1:5, N=1000)

So the p-value is close to zero. However for random sets, the p-values are distributed uniformly:

replicate(n=10, expr=gset(S=sample.int(n=1000, size=5), N=1000))

Alternatively, you can pass the names of genes as S with a sorted list of gene names as r (in which case the scores default to the ranks in the list), or a numeric vector of scores named by genes as r.

gset(S=c("gene 1", "gene 5", "gene 40"), r=paste("gene", 1:100))

Multiple gene sets can thus be tested for enrichment with a single call to a high level function such as sapply (or, if you have many sets to test and multiple cores available, mclapply), for instance:

gene_sets <- c(list(1:5), replicate(n=10, simplify=FALSE, expr=sample.int(n=1000, size=5)))
names(gene_sets) <- c("enriched set", paste("unenriched set", 1:10))
gene_sets
sapply(gene_sets, function(set) gset(S=set, N=1000))

Ontological annotations

gsEasy has a function get_ontological_gene_sets for creating lists of gene sets corresponding to annotation with ontological terms such that ontological is-a relations are propagated. get_ontological_gene_sets accepts an ontological_index (see the R package ontologyIndex for more details) argument and two character vectors, corresponding to genes and terms respectively, whereby the n-th element in each vector corresponds to one annotation pair. The result, a list of character vectors of gene names, can then be used as an argument of gset.

library(ontologyIndex)
data(hpo)
df <- data.frame(
    gene=c("gene 1", "gene 2"), 
    term=c("HP:0000598", "HP:0000118"), 
    name=hpo$name[c("HP:0000598", "HP:0000118")], 
    stringsAsFactors=FALSE,
    row.names=NULL)
df
get_ontological_gene_sets(hpo, gene=df$gene, term=df$term)

Gene Ontology (GO) annotations

gsEasy comes with a list of GO annotations, GO_gene_sets [based on annotations downloaded from geneontology.org on 07/08/2016], which can be loaded with data. This comprises a list of all gene sets (i.e. character vectors of gene names) associated with each GO term, for GO terms being annotated with at most 500 genes.

data(GO_gene_sets)
GO_gene_sets[1:6]

It also has a function get_GO_gene_sets which is a specialisation of get_ontological_gene_sets for the Gene Ontology (GO) which can be called passing just a file path to the annotation file (official up-to-date version available at https://geneontology.org/).



Try the gsEasy package in your browser

Any scripts or data that you put into this service are public.

gsEasy documentation built on May 29, 2024, 5 a.m.