runHyperGO: Run Gene Ontology analysis based on hypergeometric test from...
In EMA: Easy Microarray Data Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Run Gene Ontology analysis based on hypergeometric test from a probeset list

1
2
3

runHyperGO(list, pack.annot, categorySize = 1, verbose = TRUE,
name = "hyperGO", htmlreport = TRUE, txtreport = TRUE,
tabResult = FALSE, pvalue = 0.05)

`list`	vector of character with probeset names
`pack.annot`	annotation package to use
`categorySize`	integer, minimum size for category, by default = 1
`verbose`	logical, if TRUE, results are displayed, by default TRUE
`name`	character, name for output files, by default "hyperGO"
`htmlreport`	logical, if TRUE, a html report is created, by default TRUE
`txtreport`	logical, if TRUE, a txt report is created, by default TRUE
`tabResult`	logical, if TRUE, a list with the results is created, by default FALSE
`pvalue`	numeric, a cutoff for the hypergeometric test pvalue, by default 0.05

The choice of the universe could have a significant impact on the results. It is well discussed in the vignette of the GOstats package. Here, we decided to apply a non-specific filtering procedure different from the one proposed by Falcon and Gentleman. Since not all genes will be expressed under all conditions in our data, we can ask the question of defining the universe only with the expressed genes or with all the genes of the array. Actually, we are not able to distinguish the genes which are biologically non expressed, from the ones of low quality. That's why we think that the non-expressed probesets could be biologically relevant, as well as the ones with a little variation accross samples, and we decided to first defined the universe with all the genes of the array. Then, we just remove probe sets that have no Entrez Gene identifier in our annotation data or no GO annotation. Finally, the Hypergeometric test is performed on the unique EntrezId of the gene list, and the unique EntrezId of the universe. The pvalues in output are not corrected from multiple testing. Note that because of the existing dependence structure (between genes, and GO terms) it is difficult to do any multiple testing correction. Moreover the most insteresting genesets are not necessarily the ones with the smallest pvalues. Nodes that are interesting are typically those with a reasonable number of genes (10 or more) and small pvalues.

runHyperGO needs packages GOstats and GO.db from Bioconductor.

The R objects or the Txt and html reports

`BP`	Data.frame with results for Biological Process with GO Id, pvalue, Odd Ratio, Expected count, Size and GO Term
`MF`	Idem for Molecular Function
`CC`	Idem for Cellular Component

Nicolas Servant, Eleonore Gravier, Pierre Gestraud, Cecile Laurent, Caroline Paccard, Anne Biton, Jonas Mandel, Bernard Asselain, Emmanuel Barillot, Philippe Hupe

hyperGTest,runHyperKEGG

## Not run: 
require(hgu133plus2.db)
data(marty)

## Probe list
probeList <- rownames(marty)[1:50]

## Hypergeometric test for GO pathway
res <- runHyperGO(probeList, htmlreport = FALSE, txtreport = FALSE,
    tabResult = TRUE, pack.annot = "hgu133plus2.db")

## End(Not run)