collectionMetadata: Gene Set Collection Metadata

Description Usage Arguments Value Gene Set URLs Feature ID Types Organism Adding arbitrary collectionMetadata Examples

Description

The design of the GeneSetDb is such that we assume that groups of gene sets are usually defined together and will therefore share similar metadata. These groups of gene sets will fall into the same "collection", and, therefore, metadata for particular gene sets are tracked at the collection level.

Types of metadata being referred to could be things like the organism that a batch of gene sets were defined in, the type of feature identifiers that a collection of gene sets are using (ie. GSEABase::EntrezIdentifier()) or a URL pattern that combines the collection,name compound key that one can browse to in order to find out more information about the gene set.

There are explicit helper functions that set and get these aforementioned metadata, namely org(), featureIdType(), geneSetCollectionURLfunction(), and geneSetURL(). Aribtrary metadata can be stored at the collection level using the addCollectionMetadata() function. More details are provided below.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
collectionMetadata(x, collection, name, ...)

geneSetURL(x, i, j, ...)

geneSetCollectionURLfunction(x, i, ...)

geneSetCollectionURLfunction(x, i) <- value

featureIdType(x, i, ...)

featureIdType(x, i) <- value

org(x, i, ...)

org(x, i) <- value

addCollectionMetadata(
  x,
  xcoll,
  xname,
  value,
  validate.value.fn = NULL,
  allow.add = TRUE
)

Arguments

x

GeneSetDb()

collection

The geneset collection to to query

name

The name of the metadata variable to get the value for

...

not used yet

i

The collection,name compound key identifier of the gene set

j

The collection,name compound key identifier of the gene set

value

The value of the metadata variable

xcoll

The collection name

xname

The name of the metadata variable

validate.value.fn

If a function is provided, it is run on value and msut return TRUE for addition to be made

allow.add

If FALSE, this xcoll,xname should be in the GeneSetDb already, and this will fail because something is deeply wrong with the world

Value

A character vector of URLs for each of the genesets identified by i, j. NA is returned for genesets i,j that are not found in x.

The updated GeneSetDb.

Gene Set URLs

A URL function can be defined per collection that takes the collection,name compound key and generates a URL for the gene set that the user can browse to for futher information. For instance, the geneSetCollectionURLfunction() for the MSigDB collections are defined like so:

1
2
3
4
5
6
url.fn <- function(collection, name) {
  url <- 'http://www.broadinstitute.org/gsea/msigdb/cards/%s.html'
  sprintf(url, name)
}
gdb <- getMSigGeneSetDb('H')
geneSetCollectionURLfunction(gdb, 'H') <- url.fn

In this way, a call to geneSetURL(gdb, 'H', 'HALLMARK_ANGIOGENESIS') will return http://www.broadinstitute.org/gsea/msigdb/cards/HALLMARK_ANGIOGENESIS.html.

This function is vectorized over i and j

Feature ID Types

When defining a set of gene sets in a collection, the identifiers used must be of the same type. Most often you'll probably be working with Entrez identifiers, simply because that's what most of the annotations work with.

As such, you'd define that your collection uses geneset identifiers like so:

1
2
3
4
gdb <- getMSigGeneSetDb('H')
featureIdType(gdb, 'H') <- GSEABase::ENSEMBLIdentifier()
## or, equivalently (but you don't want to use this)
gdb <- addCollectionMetadata(gdb, 'H', 'id_type', GSEABase::ENSEMBLIdentifier())

Organism

You're going to want to keep track of the organism the experiments were run in that were used to define this collection of gene sets.

1
2
gdb <- getMSigGeneSetDb('H')
org(gdb, 'H') <- 'Homo_sapiens'

Adding arbitrary collectionMetadata

Adds arbitrary metadata to a gene set collection of a GeneSetDb

Note that this is not a replacement method! You must catch the returned object to keep the one with the updated collectionMetadata. Although this function is exported, I imagine this being used mostly through predefined replace methods that use this as a utility function, such as the replacement methods for org(), and featureIdType().

1
2
gdb <- getMSigGeneSetDb('H')
gdb <- addCollectionMetadata(gdb, 'H', 'foo', 'bar')

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
gdb <- getMSigGeneSetDb('H')

## Gene Set URLs
geneSetURL(gdb, 'H', 'HALLMARK_ADIPOGENESIS')
geneSetURL(gdb, c('H', 'H'),
           c('HALLMARK_ADIPOGENESIS', 'HALLMARK_ANGIOGENESIS'))

## feature_id TYpe
featureIdType(gdb, 'H')

## Organism
org(gdb, 'H')

## Arbitrary metadata
gdb <- addCollectionMetadata(gdb, 'H', 'foo', 'bar')
cmh <- collectionMetadata(gdb, 'H') ## print this to see

lianos/multiGSEA documentation built on Nov. 17, 2020, 1:26 p.m.