egsea-index: Functions to create gene set collection indexes for EGSEA
In EGSEA: Ensemble of Gene Set Enrichment Analyses

Description Usage Arguments Details Value Examples

buildIdx indexes the MSigDB, KEGG and GeneSetDB collections to be used for the EGSEA analysis.

buildKEGGIdx prepares the KEGG pathway collection to be used for the EGSEA analysis.

buildMSigDBIdx prepares the MSigDB gene set collections to be used for the EGSEA analysis.

buildGeneSetDBIdx prepares the GeneSetDB gene set collections to be used for the EGSEA analysis.

buildCustomIdx creates a gene set collection from a given list of gene sets to be used for the EGSEA analysis.

buildGMTIdx creates a gene set collection from a given GMT file to be used for the EGSEA analysis.

buildIdx(entrezIDs, species = "human", msigdb.gsets = "all",
  gsdb.gsets = "none", go.part = FALSE, kegg.updated = FALSE,
  kegg.exclude = c(), min.size = 1)

buildKEGGIdx(entrezIDs, species = "human", min.size = 1, updated = FALSE,
  exclude = c())

buildMSigDBIdx(entrezIDs, species = "Homo sapiens", geneSets = "all",
  go.part = FALSE, min.size = 1)

buildGeneSetDBIdx(entrezIDs, species, geneSets = "all", go.part = FALSE,
  min.size = 1)

buildCustomIdx(geneIDs, gsets, anno = NULL, label = "custom",
  name = "User-Defined Gene Sets", species = "Human", min.size = 1)

buildGMTIdx(geneIDs, gmt.file, anno.cols = 0, anno.col.names = NULL,
  label = "gmtcustom", name = "User-Defined GMT Gene Sets",
  species = "Human", min.size = 1)

`entrezIDs`	character, a vector that stores the Entrez Gene IDs tagged in your dataset. The order of the Entrez Gene IDs should match those of the count/expression matrix row names.
`species`	character, determine the organism of selected gene sets: "human", "mouse" or "rat".
`msigdb.gsets`	character, a vector determines which gene set collections should be used from MSigDB. It can take values from this list: "h", "c1", "c2", "c3", "c4", "c5", "c6","c7". "h" and "c1" are human specific. If "all", all available gene set collections are loaded. If "none", MSigDB collections are excluded.
`gsdb.gsets`	character, a vector determines which gene set collections are loaded from the GeneSetDB. It takes "none", "all", "gsdbdis", "gsdbgo", "gsdbdrug", "gsdbpath" or "gsdbreg". "none" excludes the GeneSetDB collections. "all" includes all the GeneSetDB collections. "gsdbdis" to load the disease collection, "gsdbgo" to load the GO terms collection, "gsdbdrug" to load the drug/chemical collection, "gsdbpath" to load the pathways collection and "gsdbreg" to load the gene regulation collection.
`go.part`	logical, whether to partition the GO term collections into the three GO domains: CC, MF and BP or use the entire collection all together.
`kegg.updated`	logical, set to TRUE if you want to download the most recent KEGG pathways.
`kegg.exclude`	character, vector used to exclude KEGG pathways of specific type(s): Disease, Metabolism, Signaling. If "all", none fo the KEGG collections is included.
`min.size`	integer, the minium number of genes required in a testing gene set
`updated`	logical, set to TRUE if you want to download the most recent KEGG pathways.
`exclude`	character, vector used to exclude KEGG pathways of specific category. Accepted values are "Disease", "Metabolism", or "Signaling".
`geneSets`	character, a vector determines which gene set collections should be used. For MSigDB, it can take values from this list: "all", "h", "c1", "c2", "c3", "c4", "c5", "c6","c7". "c1" is human specific. For GeneSetDB, it takes "all", "gsdbdis", "gsdbgo", "gsdbdrug", "gsdbpath" or "gsdbreg". "gsdbdis" is to load the disease collection, "gsdbgo" to load the GO terms collection, "gsdbdrug" to load the drug/chemical collection, "gsdbpath" to load the pathways collection and "gsdbreg" to load the gene regulation collection. If "all", all available gene set collections are loaded.
`geneIDs`	character, a vector that stores the Gene IDs tagged in your dataset. The order of the Gene IDs must match those of the count/expression matrix row names. Gene IDs can be in any annotation, e.g., Symbols, Ensembl, etc., as soon as the parameter `gsets` uses the same Gene ID annotation.
`gsets`	list, list of gene sets. Each gene set is character vector of Enterz IDs. The names of the list should match the GeneSet column in the `anno` argument (if it is provided).
`anno`	list, dataframe that stores a detailed annotation for each gene set. Some of its fields can be ID, GeneSet, PubMed, URLs, etc. The GeneSet field is mandatory and should have the same names as the `gsets`' names.
`label`	character,a unique id that identifies the collection of gene sets
`name`	character,the collection name to be used in the EGSEA report
`gmt.file`	character, the path and name of the GMT file
`anno.cols`	integer, number of columns in the GMT file that are used for annotation. These columns should be inserted immediately after the second column.
`anno.col.names`	character, vector of the names of the annotation columns.

buildIdx indexes the MSigDB, KEGG and GeneSetDB gene set collections, and loads gene set annotation.

buildKEGGIdx indexes the KEGG pathway gene sets and loads gene set annotation.

buildMSigDBIdx indexes the MSigDB gene sets and loads gene set annotation.

buildGeneSetDBIdx indexes the GeneSetDB gene sets and loads gene set annotation.

buildCustomIdx indexes newly created gene sets and attach gene set annotation if provided.

buildGMTIdx indexes newly created gene sets and attach gene set annotation if provided.

buildIdx returns a list of gene set collection indexes, where each element of the list is an object of the class GSCollectionIndex.

buildKEGGIdx returns an object of the class GSCollectionIndex.

buildMSigDBIdx returns a list of gene set collection indexes, where each element of the list is an object of the class GSCollectionIndex.

buildGeneSetDBIdx returns a list of gene set collection indexes, where each element of the list is an object of the class GSCollectionIndex.

buildCustomIdx returns an object of the class GSCollectionIndex.

buildGMTIdx returns an object of the class GSCollectionIndex.

# example of buildIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildIdx(entrezIDs=rownames(v$E), species="human",
         msigdb.gsets = c("h", "c5"),
			go.part = TRUE,
         kegg.exclude = c("Metabolism"))
names(gs.annots)
# example of buildKEGGIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildKEGGIdx(entrezIDs=rownames(v$E), species="human")

# example of buildMSigDBIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildMSigDBIdx(entrezIDs=rownames(v$E), species="human",
geneSets=c("h", "c2"))
names(gs.annots)
# example of buildGeneSetDBIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildGeneSetDBIdx(entrezIDs=rownames(v$E), species="human")
names(gs.annots)


# example of buildCustomIdx
library(EGSEAdata) 
data(il13.data)
v = il13.data$voom
data(kegg.pathways)
gsets = kegg.pathways$human$kg.sets[1:50]
gs.annot = buildCustomIdx(geneIDs=rownames(v$E), gsets= gsets, 
species="human")
class(gs.annot)


# example of buildGMTIdx
library(EGSEAdata) 
data(il13.data)
v = il13.data$voom
#gs.annot = buildGMTIdx(geneIDs=rownames(v$E), gsets= gmt.file, 
#species="human")
#class(gs.annot)