egsea-index: Functions to create gene set collection indexes for EGSEA

Description Usage Arguments Details Value Examples

Description

buildIdx indexes the MSigDB, KEGG and GeneSetDB collections to be used for the EGSEA analysis.

buildKEGGIdx prepares the KEGG pathway collection to be used for the EGSEA analysis.

buildMSigDBIdx prepares the MSigDB gene set collections to be used for the EGSEA analysis.

buildGeneSetDBIdx prepares the GeneSetDB gene set collections to be used for the EGSEA analysis.

buildCustomIdx creates a gene set collection from a given list of gene sets to be used for the EGSEA analysis.

buildGMTIdx creates a gene set collection from a given GMT file to be used for the EGSEA analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
buildIdx(entrezIDs, species = "human", msigdb.gsets = "all",
  gsdb.gsets = "none", go.part = FALSE, kegg.updated = FALSE,
  kegg.exclude = c(), min.size = 1)

buildKEGGIdx(entrezIDs, species = "human", min.size = 1, updated = FALSE,
  exclude = c())

buildMSigDBIdx(entrezIDs, species = "Homo sapiens", geneSets = "all",
  go.part = FALSE, min.size = 1)

buildGeneSetDBIdx(entrezIDs, species, geneSets = "all", go.part = FALSE,
  min.size = 1)

buildCustomIdx(geneIDs, gsets, anno = NULL, label = "custom",
  name = "User-Defined Gene Sets", species = "Human", min.size = 1)

buildGMTIdx(geneIDs, gmt.file, anno.cols = 0, anno.col.names = NULL,
  label = "gmtcustom", name = "User-Defined GMT Gene Sets",
  species = "Human", min.size = 1)

Arguments

entrezIDs

character, a vector that stores the Entrez Gene IDs tagged in your dataset. The order of the Entrez Gene IDs should match those of the count/expression matrix row names.

species

character, determine the organism of selected gene sets: "human", "mouse" or "rat".

msigdb.gsets

character, a vector determines which gene set collections should be used from MSigDB. It can take values from this list: "h", "c1", "c2", "c3", "c4", "c5", "c6","c7". "h" and "c1" are human specific. If "all", all available gene set collections are loaded. If "none", MSigDB collections are excluded.

gsdb.gsets

character, a vector determines which gene set collections are loaded from the GeneSetDB. It takes "none", "all", "gsdbdis", "gsdbgo", "gsdbdrug", "gsdbpath" or "gsdbreg". "none" excludes the GeneSetDB collections. "all" includes all the GeneSetDB collections. "gsdbdis" to load the disease collection, "gsdbgo" to load the GO terms collection, "gsdbdrug" to load the drug/chemical collection, "gsdbpath" to load the pathways collection and "gsdbreg" to load the gene regulation collection.

go.part

logical, whether to partition the GO term collections into the three GO domains: CC, MF and BP or use the entire collection all together.

kegg.updated

logical, set to TRUE if you want to download the most recent KEGG pathways.

kegg.exclude

character, vector used to exclude KEGG pathways of specific type(s): Disease, Metabolism, Signaling. If "all", none fo the KEGG collections is included.

min.size

integer, the minium number of genes required in a testing gene set

updated

logical, set to TRUE if you want to download the most recent KEGG pathways.

exclude

character, vector used to exclude KEGG pathways of specific category. Accepted values are "Disease", "Metabolism", or "Signaling".

geneSets

character, a vector determines which gene set collections should be used. For MSigDB, it can take values from this list: "all", "h", "c1", "c2", "c3", "c4", "c5", "c6","c7". "c1" is human specific. For GeneSetDB, it takes "all", "gsdbdis", "gsdbgo", "gsdbdrug", "gsdbpath" or "gsdbreg". "gsdbdis" is to load the disease collection, "gsdbgo" to load the GO terms collection, "gsdbdrug" to load the drug/chemical collection, "gsdbpath" to load the pathways collection and "gsdbreg" to load the gene regulation collection. If "all", all available gene set collections are loaded.

geneIDs

character, a vector that stores the Gene IDs tagged in your dataset. The order of the Gene IDs must match those of the count/expression matrix row names. Gene IDs can be in any annotation, e.g., Symbols, Ensembl, etc., as soon as the parameter gsets uses the same Gene ID annotation.

gsets

list, list of gene sets. Each gene set is character vector of Enterz IDs. The names of the list should match the GeneSet column in the anno argument (if it is provided).

anno

list, dataframe that stores a detailed annotation for each gene set. Some of its fields can be ID, GeneSet, PubMed, URLs, etc. The GeneSet field is mandatory and should have the same names as the gsets' names.

label

character,a unique id that identifies the collection of gene sets

name

character,the collection name to be used in the EGSEA report

gmt.file

character, the path and name of the GMT file

anno.cols

integer, number of columns in the GMT file that are used for annotation. These columns should be inserted immediately after the second column.

anno.col.names

character, vector of the names of the annotation columns.

Details

buildIdx indexes the MSigDB, KEGG and GeneSetDB gene set collections, and loads gene set annotation.

buildKEGGIdx indexes the KEGG pathway gene sets and loads gene set annotation.

buildMSigDBIdx indexes the MSigDB gene sets and loads gene set annotation.

buildGeneSetDBIdx indexes the GeneSetDB gene sets and loads gene set annotation.

buildCustomIdx indexes newly created gene sets and attach gene set annotation if provided.

buildGMTIdx indexes newly created gene sets and attach gene set annotation if provided.

Value

buildIdx returns a list of gene set collection indexes, where each element of the list is an object of the class GSCollectionIndex.

buildKEGGIdx returns an object of the class GSCollectionIndex.

buildMSigDBIdx returns a list of gene set collection indexes, where each element of the list is an object of the class GSCollectionIndex.

buildGeneSetDBIdx returns a list of gene set collection indexes, where each element of the list is an object of the class GSCollectionIndex.

buildCustomIdx returns an object of the class GSCollectionIndex.

buildGMTIdx returns an object of the class GSCollectionIndex.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# example of buildIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildIdx(entrezIDs=rownames(v$E), species="human",
         msigdb.gsets = c("h", "c5"),
			go.part = TRUE,
         kegg.exclude = c("Metabolism"))
names(gs.annots)
# example of buildKEGGIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildKEGGIdx(entrezIDs=rownames(v$E), species="human")

# example of buildMSigDBIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildMSigDBIdx(entrezIDs=rownames(v$E), species="human",
geneSets=c("h", "c2"))
names(gs.annots)
# example of buildGeneSetDBIdx
library(EGSEAdata)
data(il13.data)
v = il13.data$voom
gs.annots = buildGeneSetDBIdx(entrezIDs=rownames(v$E), species="human")
names(gs.annots)


# example of buildCustomIdx
library(EGSEAdata) 
data(il13.data)
v = il13.data$voom
data(kegg.pathways)
gsets = kegg.pathways$human$kg.sets[1:50]
gs.annot = buildCustomIdx(geneIDs=rownames(v$E), gsets= gsets, 
species="human")
class(gs.annot)


# example of buildGMTIdx
library(EGSEAdata) 
data(il13.data)
v = il13.data$voom
#gs.annot = buildGMTIdx(geneIDs=rownames(v$E), gsets= gmt.file, 
#species="human")
#class(gs.annot)

EGSEA documentation built on Jan. 30, 2021, 2:01 a.m.