GOtest: Gene Ontology Functional Enrichment Test

View source: R/GOtest.R

GOtestR Documentation

Gene Ontology Functional Enrichment Test

Description

This function performs functional enrichment analyses.

Usage

GOtest(x, go, query.population=NULL, background='common', 
	name.x='Input', name.go='Category',
	method=c('hypergeometric','GSEA','logitreg'), adj="BH",
	ReportOverlap=TRUE, ncores=1, gsea.alpha=1, 
	permutations=ifelse(method=='GSEA',1000,0), iseed=12345)

Arguments

x

a data.frame of query gene sets, with the first column the gene symbols and the second column the set names. A third column is needed for the GSEA and the logitreg method.

go

a data.frame of GO gene sets, with the first column the gene symbols and the second column being GO categories. The GO category can be written in a format of System:Term which will be decoded in the output.

query.population

a population of the genes from which the query gene sets are collected.

background

an integer or a keyword specifying the background for hypergeometric test method. See Details.

name.x

a character string specifing the name of the query.

name.go

a character string specifing the name of the function gene sets.

method

algorithm to performing the enrichment test. See Details.

adj

approach to correct for multiple tests for methods hypergeometric and logitreg. See function p.adjust.

ReportOverlap

whether to output overlapping elements.

ncores

number of CPU cores to be used for parallel computing. Using multiple cores might not lead to a significant speed gain for hypergeometric or logitreg.

gsea.alpha

power to scale the weights in GSEA: 0 (unweighted = Kolmogorov-Smirnov), 1 (weighted), and 2 or larger (over-weighted)

permutations

number of permutations for computing significance and controlling FDR for GSEA and logitreg method.

iseed

seed for random number generation in permutations.

Details

If method is "hypergeometric", x must have at least two columns, with the first column the gene IDs and the second column the group name(s) for each gene in the row. Multiple tests will be controled by Benjamini-Hochberg's FDR.

If method is "GSEA" (see reference 1), x must have three columns, the first column contains gene IDs, the second column specifies phenotype name or group name, and the third column specifies gene-phenotype correlation strengths (eg, log fold change of differential expression, t-test statistics, or minus log P values), whose sign and absolute values can be used to rank the genes within a phenotype/group.

If method is "logitreg" (logistic regression; see reference 2), x should have at least three columns. The first three columns are the same as those for "GSEA". Any additional columns may contain covariates.

For method "GSEA", P value and multiple tests correction will be based on permutation analyses. Permutations can also be used for the "logitreg" method (not recommended as logistic regression is very slow).

The argument background specifies the size of the gene set background for hypergeometic test. If background is an integer, it must be larger than the size of the union of query gene set and annotated gene set. Alternatively, the size of the gene set background will be determined by one of the following keywords:

  • query: the genes supplied by query.population;

  • annotation: the genes present in database go;

  • intersection: the intersection between query.population and go;

  • common: the same as intersection;

  • union: the union of query.population and go;

  • query.population should not be NULL if background is query, common or union.

Value

A data.frame with components:

System

GO gene set system classifications

Category

GO gene set names

Input

Query gene set names

Overlap.Size

Size of the overlap between GO gene set and query gene set

Input.Size

Size of the query gene set

Category.Size

GO gene set size

Background.Size

Size of the background population for hypergeometric test

FE

Fold enrichment of the overlap for hypergeometric test

ES

GSEA enrichment score

log.odds

logitreg log odds

SE

logitreg standard error of log odds

Z

logitreg Z-statistic

Pvalue

P value significance of the overlap

P.adj

Adjusted P value

Overlap.Items

Overlapping items

The column names starting with Input and Category will be specified by options name.x and name.go.

References

1. Subramanian et al (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43): 15545-15550. 2. Sartor et al (2009). LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data. Bioinformatics 25(2): 211-217.

See Also

msigdb.gsea, plotOverlap, p.adjust, curated.genesets, GSEA

Examples

###Example usage 1: the hypergeometric test with MSigDB GO/pathway annotations

#In this example, we will load pre-installed 23 functional gene sets 
#curated by the MacArthur's Lab (https://github.com/macarthur-lab/gene_lists) 
# and then apply the hypergeometric test to evaluate the overlap in MSigDB GO/pathways.

library(GOtest)
MAGenes=curated.genesets(c('MacArthur'))
head(MAGenes)

#Some enrichment analysis methods, eg the hypergeometric test, 
#require a gene universe or a population of gene background. 
#Here we will use a pre-installed set of approved symbols for protein-coding genes by HGNC.

universe=curated.genesets(c('HGNC_universe'))$Gene
str(universe)

#For details about function curated.genesets, check help ?curated.genesets.

#Now let us run enrichment of MacArthur gene sets against the MSigDB canonical pathways.

result=msigdb.gsea(x=MAGenes, query.population=universe, genesets=c('c2.cp'),
	background='query', name.x='MacArthur')
head(result)

###Example usage 2: weighted enrichment tests

#We will again make use of the MacArthur gene set and the gene universe
#of HGNC approved symbols, so make sure they have been loaded as in Example 1.

library(GOtest)
MAGenes=curated.genesets(c('MacArthur'))
universe=curated.genesets(c('HGNC_universe'))$Gene
(n=length(universe))

#In this example, we will try both the hypergeometric test and weighted enrichment tests, 
#including GSEA and logistic regression, by geneating a toy dataset through simulation 
#of random gene-phenotype associations.

set.seed(123)
toy=data.frame(Gene=universe, Phenotype='Simulated', Z=rnorm(n,0,1), stringsAsFactors=FALSE)

#Select genes with absolute Z value larger than 3 
#and separate them into up and down groups based on the sign of Z value, 
#then run the hypergeometric test on both groups against the MacAuther gene sets:

toy.3=toy[abs(toy$Z)>3,]
toy.3$Direction=ifelse(toy.3$Z > 0, 'Up','Down')
fit1=GOtest(x=toy.3[,c('Gene','Direction')], go=MAGenes, query.population=universe,
	background='query', name.x='Toy', name.go='MacArthur', method='hypergeometric')

#As expected, no significant enrichment identified:

head(fit1)

## Not run: 
#Next, we are going to run weighted enrichment tests on the full test dataset 
#by using GSEA or logistic regression. First, run GSEA:

fit2=GOtest(x=toy, go=MAGenes, name.x='Toy', name.go='MacArthur', method='GSEA')
head(fit2)
#Again there is no significant enrichment. Let us check the GSEA running 
#enrichment score plot for the top 10 MacArthur terms:
plotGseaEnrTable(GseaTable=fit2[1:10,], x=toy, go=MAGenes)

#Run logistic regression:

fit3=GOtest(x=toy, go=MAGenes, name.x='Toy', name.go='MacArthur', method='logitreg')
head(fit3)

## End(Not run)

mw201608/GOtest documentation built on May 3, 2023, 11:49 a.m.