GOtest | R Documentation |
This function performs functional enrichment analyses.
GOtest(x, go, query.population=NULL, background='common',
name.x='Input', name.go='Category',
method=c('hypergeometric','GSEA','logitreg'), adj="BH",
ReportOverlap=TRUE, ncores=1, gsea.alpha=1,
permutations=ifelse(method=='GSEA',1000,0), iseed=12345)
x |
a data.frame of query gene sets, with the first column the gene symbols and the second column the set names. A third column is needed for the |
go |
a data.frame of GO gene sets, with the first column the gene symbols and the second column being GO categories. The GO category can be written in a format of System:Term which will be decoded in the output. |
query.population |
a population of the genes from which the query gene sets are collected. |
background |
an integer or a keyword specifying the background for |
name.x |
a character string specifing the name of the query. |
name.go |
a character string specifing the name of the function gene sets. |
method |
algorithm to performing the enrichment test. See |
adj |
approach to correct for multiple tests for methods |
ReportOverlap |
whether to output overlapping elements. |
ncores |
number of CPU cores to be used for parallel computing. Using multiple cores might not lead to a significant speed gain for |
gsea.alpha |
power to scale the weights in |
permutations |
number of permutations for computing significance and controlling FDR for |
iseed |
seed for random number generation in permutations. |
If method
is "hypergeometric
", x
must have at least two columns, with the first column the gene IDs and the second column the group name(s) for each gene in the row. Multiple tests will be controled by Benjamini-Hochberg's FDR.
If method
is "GSEA
" (see reference 1), x
must have three columns, the first column contains gene IDs, the second column specifies phenotype name or group name, and the third column specifies gene-phenotype correlation strengths (eg, log fold change of differential expression, t-test statistics, or minus log P values), whose sign and absolute values can be used to rank the genes within a phenotype/group.
If method
is "logitreg
" (logistic regression; see reference 2), x
should have at least three columns. The first three columns are the same as those for "GSEA
". Any additional columns may contain covariates.
For method "GSEA
", P value and multiple tests correction will be based on permutation analyses. Permutations can also be used for the "logitreg
" method (not recommended as logistic regression is very slow).
The argument background
specifies the size of the gene set background for hypergeometic
test. If background
is an integer, it must be larger than the size of the union of query gene set and annotated gene set. Alternatively, the size of the gene set background will be determined by one of the following keywords:
query
: the genes supplied by query.population
;
annotation
: the genes present in database go
;
intersection
: the intersection between query.population
and go
;
common
: the same as intersection
;
union
: the union of query.population
and go
;
query.population
should not be NULL if background
is query
, common
or union
.
A data.frame with components:
System |
GO gene set system classifications |
Category |
GO gene set names |
Input |
Query gene set names |
Overlap.Size |
Size of the overlap between GO gene set and query gene set |
Input.Size |
Size of the query gene set |
Category.Size |
GO gene set size |
Background.Size |
Size of the background population for |
FE |
Fold enrichment of the overlap for |
ES |
|
log.odds |
|
SE |
|
Z |
|
Pvalue |
P value significance of the overlap |
P.adj |
Adjusted P value |
Overlap.Items |
Overlapping items |
The column names starting with Input
and Category
will be specified by options name.x
and name.go
.
1. Subramanian et al (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43): 15545-15550. 2. Sartor et al (2009). LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data. Bioinformatics 25(2): 211-217.
msigdb.gsea
, plotOverlap
, p.adjust
, curated.genesets
, GSEA
###Example usage 1: the hypergeometric test with MSigDB GO/pathway annotations
#In this example, we will load pre-installed 23 functional gene sets
#curated by the MacArthur's Lab (https://github.com/macarthur-lab/gene_lists)
# and then apply the hypergeometric test to evaluate the overlap in MSigDB GO/pathways.
library(GOtest)
MAGenes=curated.genesets(c('MacArthur'))
head(MAGenes)
#Some enrichment analysis methods, eg the hypergeometric test,
#require a gene universe or a population of gene background.
#Here we will use a pre-installed set of approved symbols for protein-coding genes by HGNC.
universe=curated.genesets(c('HGNC_universe'))$Gene
str(universe)
#For details about function curated.genesets, check help ?curated.genesets.
#Now let us run enrichment of MacArthur gene sets against the MSigDB canonical pathways.
result=msigdb.gsea(x=MAGenes, query.population=universe, genesets=c('c2.cp'),
background='query', name.x='MacArthur')
head(result)
###Example usage 2: weighted enrichment tests
#We will again make use of the MacArthur gene set and the gene universe
#of HGNC approved symbols, so make sure they have been loaded as in Example 1.
library(GOtest)
MAGenes=curated.genesets(c('MacArthur'))
universe=curated.genesets(c('HGNC_universe'))$Gene
(n=length(universe))
#In this example, we will try both the hypergeometric test and weighted enrichment tests,
#including GSEA and logistic regression, by geneating a toy dataset through simulation
#of random gene-phenotype associations.
set.seed(123)
toy=data.frame(Gene=universe, Phenotype='Simulated', Z=rnorm(n,0,1), stringsAsFactors=FALSE)
#Select genes with absolute Z value larger than 3
#and separate them into up and down groups based on the sign of Z value,
#then run the hypergeometric test on both groups against the MacAuther gene sets:
toy.3=toy[abs(toy$Z)>3,]
toy.3$Direction=ifelse(toy.3$Z > 0, 'Up','Down')
fit1=GOtest(x=toy.3[,c('Gene','Direction')], go=MAGenes, query.population=universe,
background='query', name.x='Toy', name.go='MacArthur', method='hypergeometric')
#As expected, no significant enrichment identified:
head(fit1)
## Not run:
#Next, we are going to run weighted enrichment tests on the full test dataset
#by using GSEA or logistic regression. First, run GSEA:
fit2=GOtest(x=toy, go=MAGenes, name.x='Toy', name.go='MacArthur', method='GSEA')
head(fit2)
#Again there is no significant enrichment. Let us check the GSEA running
#enrichment score plot for the top 10 MacArthur terms:
plotGseaEnrTable(GseaTable=fit2[1:10,], x=toy, go=MAGenes)
#Run logistic regression:
fit3=GOtest(x=toy, go=MAGenes, name.x='Toy', name.go='MacArthur', method='logitreg')
head(fit3)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.