GeneSetDb-class: A container for geneset definitions.

Description Usage Arguments Slots GeneSetDb Construction Interrogating a GeneSetDb GeneSetDb manipulation See Also Examples

Description

Please refer to the multiGSEA vignette (vignette("multiGSEA")), (and the "The GeneSetDb Class" section, in particular) for a more deatiled description of the sematnics of this central data object.

The GeneSetDb class serves the same purpose as the GSEABase::GeneSetCollection() class does: it acts as a centralized object to hold collections of Gene Sets. The reason for its existence is because there are things that I wanted to know about my gene set collections that weren't easily inferred from what is essentially a "list of GeneSets" that is the GeneSetCollection class.

Gene Sets are internally represented by a data.table in "a tidy" format, where we minimally require non NA values for the following three character columns:

The (collection, name) compound key is the primary key of a gene set. There will be as many entries with the same (collection, name) as there are genes/features in that set.

The GeneSetDb tracks metadata about genesets at the collection level. This means that we assume that all of the feature_id's used within a collection use the same type of feature identifier (such as a GSEABase::EntrezIdentifier(), were defined in the same organism, etc.

Please refer to the "GeneSetDb" section of the vignette for more details regarding the construction and querying of a GeneSetDb object.

Usage

1
GeneSetDb(x, featureIdMap = NULL, collectionName = NULL, ...)

Arguments

x

A GeneSetCollection, a "two deep" list of either GeneSetCollections or lists of character vectors, which are the gene identifers. The "two deep" list represents the different collections (top level) at the top level, and each such list is a named list itself, which represents the gene sets in the given collection.

featureIdMap

A data.frame with 2 character columns. The first column is the ids of the genes (features) used to identify the genes in gene.sets, the second second column are IDs that this should be mapped to. Useful for testing probelevel microarray data to gene level gene set information.

collectionName

If x represents a singular collection, ie. a single GeneSetCollection or a "one deep" (named (by geneset)) list of genesets, then this parameter provides the name for the collection. If x is multiple collections, this can be character vector of same length with the names. In all cases, if a collection name can't be defined from this, then collections will be named anonymously. If a value is passed here, it will overide any names stored in the list of x.

Slots

table

The "gene set table": a data.table with geneset information, one row per gene set. Columns include collection, name, N, and n. The end user can add more columns to this data.table as desired. The actual feature IDs are computed on the fly by doing a db[J(group, id)]

db

A data.table to hold all of the original geneset id information that was used to construct this GeneSetDb.

featureIdMap

Maps the ids used in the geneset lists to the ids (rows) over the expression data the GSEA is run on

collectionMetadata

A data.table to keep metadata about each individual geneset collection, ie. the user might want to keep track of where the geneset definitions come from. Perhaps a function that parses the collection,name combination to generate an URL that points the user to more information about that geneset, etc.

GeneSetDb Construction

The GeneSetDb() constructor is sufficiently flexible enough to create a GeneSetDb object from a variety of formats that are commonly used in the bioconductor echosystem, such as:

Interrogating a GeneSetDb

You might wonder what gene sets are defined in a GeneSetDb: see the geneSets() function.

Curious about what features are defined in your GeneSetDb? See the featureIds() function.

Want the details of a particular gene set? Try the geneSet() function. This will return a data.frame of the gene set definition. Calling geneSet() on a MultiGSEAResult() will return the same data.frame along with the differential expression statistics for the individual members of the geneSet across the contrast that was tested in the multiGSEA() call that created the MultiGSEAResult().

GeneSetDb manipulation

You can subset a GeneSetDb to include a subset of genesets defined in it. To do this, you need to provide an indexing vector that is as long as length(gdb), ie. the number of gene sets defined in GeneSetDb. You can construct such a vector by performing your boolean logic over the geneSets(gdb) table.

Look at the Examples section to see how this works, where we take the MSIgDB c7 collection (aka. "ImmuneSigDB") and only keep gene sets that were defined in experiments from mouse.

See Also

?conversion

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## exampleGeneSetDF provides gene set definitions in "long form". We show
## how this can easily turned into a GeneSetDb from this form, or convert
## it to other forms (list of features, or list of list of features) to
## do the same.
gs.df <- exampleGeneSetDF()
gdb.df <- GeneSetDb(gs.df)

## list of ids
gs.df$key <- encode_gskey(gs.df)
gs.list <- split(gs.df$feature_id, gs.df$key)
gdb.list <- GeneSetDb(gs.list, collectionName='custom-sigs')

## A list of lists, where the top level list splits the collections.
## The name of the collection in the GeneSetDb is taken from this top level
## hierarchy
gs.lol <- as.list(gdb.df, nested=TRUE) ## examine this list-of lists
gdb.lol <- GeneSetDb(gs.lol) ## note that collection is set propperly

## GeneSetDb Interrogation
gsets <- geneSets(gdb.df)
nkcells <- geneSet(gdb.df, 'cellularity', 'NK cells')
fids <- featureIds(gdb.df)

# GeneSetDb Manipulation ....................................................
# Subset ImmuneSigDB down to only gene sets defined from mouse
# NOTE: This doesn't work with updated MSigDB collections
## Not run: 
idb <- getMSigGeneSetDb('c7', 'mouse')
igs <- geneSets(idb)
table(igs$organism)
## Homo sapiens Mus musculus
##         1888         2984
idb.mm <- idb[igs$organism == 'Mus musculus']
length(idb.mm) ## 2984

## End(Not run)

lianos/multiGSEA documentation built on Nov. 17, 2020, 1:26 p.m.