Over-representation Analysis with EGSEA Reporting Capabilities

Description

This is the main function to carry out gene set enrichment analysis using the over-representation analysis (ORA) only.

Usage

1
2
3
4
5
6
egsea.ora(entrezIDs, universe = NULL, logFC = NULL, title = NULL,
  gs.annots, symbolsMap = NULL, minSize = 2, display.top = 20,
  sort.by = "p.adj", egsea.dir = NULL, kegg.dir = NULL,
  logFC.cutoff = 0, sum.plot.axis = "p.adj", sum.plot.cutoff = NULL,
  vote.bin.width = 5, num.threads = 4, report = TRUE,
  print.base = FALSE, verbose = FALSE)

Arguments

entrezIDs

character, a vector of Entrez Gene IDs to be tested for ORA.

universe

character, a vector of Enterz IDs to be used as a background list. If universe=NULL, the background list is created from the AnnotationDbi package.

logFC

double, is a matrix or vector of log fold changes of the same length of entrezIDs. If logFC=NULL, 1 is used as a default value. Then, the regulation direction in heatmaps and pathway maps is not indicative to the gene regulation direction.

title

character, a short description of the experimental contrast.

gs.annots

list, list of objects of class GSCollectionIndex. It is generated using one of these functions: buildIdx, buildMSigDBIdx, buildKEGGIdx, buildGeneSetDBIdx, and buildCustomIdx.

symbolsMap

dataframe, an K x 2 matrix stores the gene symbol of each Entrez Gene ID. It is used for the heatmap visualization. The order of rows should match that of the entrezIDs. Default symbolsMap=NULL.

minSize

integer, the minimum size of a gene set to be included in the analysis. Default minSize= 2.

display.top

integer, the number of top gene sets to be displayed in the EGSEA report. You can always access the list of all tested gene sets using the returned gsa list. Default is 20.

sort.by

character, determines how to order the analysis results in the stats table. It takes "p.value", "p.adj" or "Significance".

egsea.dir

character, directory into which the analysis results are written out.

kegg.dir

character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").

logFC.cutoff

numeric, cut-off threshold of logFC and is used for Sginificance Score and Regulation Direction Calculations. Default logFC.cutoff=0.

sum.plot.axis

character, the x-axis of the summary plot. All the values accepted by the sort.by parameter can be used. Default sum.plot.axis="p.adj".

sum.plot.cutoff

numeric, cut-off threshold to filter the gene sets of the summary plots based on the values of the sum.plot.axis. Default sum.plot.cutoff=NULL.

vote.bin.width

numeric, the bin width of the vote ranking. Default vote.bin.width=5.

num.threads

numeric, number of CPU threads to be used. Default num.threads=2.

report

logical, whether to generate the EGSEA interactive report. It takes longer time to run. Default is True.

print.base

logical, whether to write out the results of the individual GSE methods. Default FALSE.

verbose

logical, whether to print out progress messages and warnings.

Details

This function takes a list of Entrez gene IDs and uses the gene set collections from EGSEAdata or a custom-built collection to find over-represented gene sets in this list. It takes the advantage of the existing EGSEA reporting capabilities and generate an interative report for the ORA analysis. The results can be explored using the topSets function.

Value

A list of elements, each with two/three elements that store the top gene sets and the detailed analysis results for each contrast and the comparative analysis results.

References

Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy Huynh, Michael J. Wilson and Matthew E. Ritchie. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

See Also

topSets, buildIdx, buildMSigDBIdx, buildKEGGIdx, buildGeneSetDBIdx, and buildCustomIdx

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Example of egsea.ora
library(EGSEAdata)
data(il13.data)
voom.results = il13.data$voom
contrast = il13.data$contra
library(limma)
vfit = lmFit(voom.results, voom.results$design)
vfit = contrasts.fit(vfit, contrast)
vfit = eBayes(vfit)
top.Table = topTable(vfit, coef=1, number=Inf, p.value=0.05, lfc=1)
deGenes = as.character(top.Table$FeatureID)
logFC =  top.Table$logFC
names(logFC) = deGenes
gs.annots = buildIdx(entrezIDs=deGenes, species="human", 
msigdb.gsets="none",
         kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
# set report = TRUE to generate the EGSEA interactive report
gsa = egsea.ora(entrezIDs=deGenes, universe= 
as.character(voom.results$genes[,1]),
             logFC =logFC, title="X24IL13-X24",  
gs.annots=gs.annots, 
             symbolsMap=top.Table[, c(1,2)], display.top = 5,
              egsea.dir="./il13-egsea-ora-report", num.threads = 2, 
				report = FALSE)
topSets(gsa)