collectionMetadata: Gene Set Collection Metadata

collectionMetadataR Documentation

Gene Set Collection Metadata

Description

Associates key:value metadata to a gene set collection of a GeneSetDb().

Usage

collectionMetadata(x, collection, name, ...)

geneSetURL(x, i, j, ...)

featureIdType(x, i, ...)

featureIdType(x, i) <- value

## S4 method for signature 'GeneSetDb,missing,missing'
collectionMetadata(x, collection, name, as.dt = FALSE)

## S4 method for signature 'GeneSetDb,character,missing'
collectionMetadata(x, collection, name, as.dt = FALSE)

## S4 method for signature 'GeneSetDb,character,character'
collectionMetadata(x, collection, name, as.dt = FALSE)

## S4 method for signature 'GeneSetDb'
geneSetURL(x, i, j, ...)

## S4 replacement method for signature 'GeneSetDb'
featureIdType(x, i) <- value

## S4 method for signature 'GeneSetDb'
featureIdType(x, i, ...)

addCollectionMetadata(
  x,
  xcoll,
  xname,
  value,
  validate.value.fn = NULL,
  allow.add = TRUE
)

## S4 method for signature 'SparrowResult'
geneSetURL(x, i, j, ...)

Arguments

x

GeneSetDb()

collection

The geneset collection to to query

name

The name of the metadata variable to get the value for

...

not used yet

i, j

The collection,name compound key identifier of the gene set

value

The value of the metadata variable

as.dt

If FALSE (default), the data.frame like thing that this funciton returns will be set to a data.frame. Set this to TRUE to keep this object as a data.table

xcoll

The collection name

xname

The name of the metadata variable

validate.value.fn

If a function is provided, it is run on value and msut return TRUE for addition to be made

allow.add

If FALSE, this xcoll,xname should be in the GeneSetDb already, and this will fail because something is deeply wrong with the world

Details

The design of the GeneSetDb is such that we assume that groups of gene sets are usually defined together and will therefore share similar metadata. These groups of gene sets will fall into the same "collection", and, therefore, metadata for particular gene sets are tracked at the collection level.

Types of metadata being referred to could be things like the organism that a batch of gene sets were defined in, the type of feature identifiers that a collection of gene sets are using (ie. GSEABase::EntrezIdentifier()) or a URL pattern that combines the collection,name compound key that one can browse to in order to find out more information about the gene set.

There are explicit helper functions that set and get these aforementioned metadata, namely featureIdType(), geneSetCollectionURLfunction(), and geneSetURL(). Aribtrary metadata can be stored at the collection level using the addCollectionMetadata() function. More details are provided below.

Value

A character vector of URLs for each of the genesets identified by ⁠i, j⁠. NA is returned for genesets ⁠i,j⁠ that are not found in x.

The updated GeneSetDb.

Methods (by class)

  • collectionMetadata(x = GeneSetDb, collection = missing, name = missing): Returns metadata for all collections

  • collectionMetadata(x = GeneSetDb, collection = character, name = missing): Returns all metadata for a specific collection

  • collectionMetadata(x = GeneSetDb, collection = character, name = character): Returns the name metadata value for a given collection.

  • geneSetURL(GeneSetDb): returns the URL for a geneset

  • featureIdType(GeneSetDb) <- value: sets the feature id type for a collection

  • featureIdType(GeneSetDb): retrieves the feature id type for a collection

  • geneSetURL(SparrowResult): returns the URL for a geneset from a SparrowResult object

Gene Set URLs

A URL function can be defined per collection that takes the collection,name compound key and generates a URL for the gene set that the user can browse to for futher information. For instance, the geneSetCollectionURLfunction() for the MSigDB collections are defined like so:

url.fn <- function(collection, name) {
  url <- 'http://www.broadinstitute.org/gsea/msigdb/cards/%s.html'
  sprintf(url, name)
}
gdb <- getMSigGeneSetDb('H')
geneSetCollectionURLfunction(gdb, 'H') <- url.fn

In this way, a call to geneSetURL(gdb, 'H', 'HALLMARK_ANGIOGENESIS') will return http://www.broadinstitute.org/gsea/msigdb/cards/HALLMARK_ANGIOGENESIS.html.

This function is vectorized over i and j

Feature ID Types

When defining a set of gene sets in a collection, the identifiers used must be of the same type. Most often you'll probably be working with Entrez identifiers, simply because that's what most of the annotations work with.

As such, you'd define that your collection uses geneset identifiers like so:

gdb <- getMSigGeneSetDb('H')
featureIdType(gdb, 'H') <- GSEABase::ENSEMBLIdentifier()
## or, equivalently (but you don't want to use this)
gdb <- addCollectionMetadata(gdb, 'H', 'id_type', GSEABase::ENSEMBLIdentifier())

Adding arbitrary collectionMetadata

Adds arbitrary metadata to a gene set collection of a GeneSetDb

Note that this is not a replacement method! You must catch the returned object to keep the one with the updated collectionMetadata. Although this function is exported, I imagine this being used mostly through predefined replace methods that use this as a utility function, such as the replacement methods ⁠featureIdType<-⁠, ⁠geneSetURLfunction<-⁠, etc.

gdb <- getMSigGeneSetDb('H')
gdb <- addCollectionMetadata(gdb, 'H', 'foo', 'bar')

Examples

gdb <- exampleGeneSetDb()

# Gene Set URLs
geneSetURL(gdb, 'c2', 'BIOCARTA_AGPCR_PATHWAY')
geneSetURL(gdb, c('c2', 'c7'),
           c('BIOCARTA_AGPCR_PATHWAY', 'GSE14308_TH2_VS_TH1_UP'))

# feature id types
featureIdType(gdb, "c2") <- GSEABase::EntrezIdentifier()
featureIdType(gdb, "c2")

## Arbitrary metadata
gdb <- addCollectionMetadata(gdb, 'c2', 'foo', 'bar')
cmh <- collectionMetadata(gdb, 'c2', as.dt = TRUE) ## print this to see

lianos/sparrow documentation built on Feb. 5, 2024, 2:58 p.m.