plotMsigWordcloud: Compute and plot word frequencies for multiple MSigDB...

View source: R/plottingFunctions.R

plotMsigWordcloudR Documentation

Compute and plot word frequencies for multiple MSigDB collections

Description

Given a gene set collection, this function computes the word frequency of gene set names from the Molecular Signatures Database (MSigDB) collection (split by _). Word frequencies are also computed using short descriptions attached with each gene set object.

Usage

plotMsigWordcloud(
  msigGsc,
  groups,
  weight = NULL,
  measure = c("tfidf", "tf"),
  version = msigdb::getMsigdbVersions(),
  org = c("auto", "hs", "mm"),
  rmwords = getMsigExclusionList(),
  type = c("Name", "Short"),
  idf = NULL
)

Arguments

msigGsc

a GeneSetCollection object, containing gene sets from the MSigDB. The GSEABase::getBroadSets() function can be used to parse XML files downloaded from MSigDB.

groups

a named list, of character vectors or numeric indices specifying node groupings. Each element of the list represent a group and contains a character vector with node names.

weight

a named numeric vector, containing weights to apply to each gene-set. This can be -log10(FDR), -log10(p-value) or an enrichment score (ideally unsigned).

measure

a character, specifying how frequencies should be computed. "tf" uses term frequencies and "tfidf" (default) applies inverse document frequency weights to term frequencies.

version

a character, specifying the version of msigdb to use (see msigdb::getMsigdbVersions()).

org

a character, specifying the organism to use. This can either be "auto" (default), "hs" or "mm".

rmwords

a character vector, containing an exclusion list of words to discard from the analysis.

type

a character, specifying the source of text mining. Either gene set names (Name) or descriptions (Short) can be used.

idf

a list of named numeric vectors, specifying inverse document frequencies to use to penalise terms from gene-set names and short descriptions. This should be a vector of length 2 with names "Name" and "Short". Numeric vectors should contain weights and names should represent the term. Precomputed versions can be retrieved using the msigdb::getMsigdbIDF().

Value

a ggplot object.

Examples

data("hgsc")
groups <- list('g1' = names(hgsc)[1:25], 'g2' = names(hgsc)[26:50])
plotMsigWordcloud(hgsc, groups, rmwords = getMsigExclusionList())


DavisLaboratory/vissE documentation built on Jan. 31, 2024, 5:02 a.m.