reduceSimMatrix: reduceSimMatrix Reduce a set of GO terms based on their...

View source: R/rrvgo.R

reduceSimMatrixR Documentation

reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.

Description

reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.

Usage

reduceSimMatrix(
  simMatrix,
  scores = c("uniqueness", "size"),
  threshold = 0.7,
  orgdb,
  keytype = "ENTREZID",
  children = TRUE
)

Arguments

simMatrix

a (square) similarity matrix

scores

one of c("uniqueness", "size"), or a *named* vector with scores provided for each term, where higher values favor choosing the term as the cluster representative. The default "uniqueness" uses a score reflecting how unique the term is. Note: if you like to use p-values as scores, consider -1*log-transforming them ('-log(p)')

threshold

similarity threshold (0-1). Some guidance: Large (allowed similarity=0.9), Medium (0.7), Small (0.5), Tiny (0.4) Defaults to Medium (0.7)

orgdb

one of org.* Bioconductor packages (the package name, or the orgdb object itself)

keytype

keytype passed to AnnotationDbi::keys to retrieve GO terms associated to gene ids in your orgdb

children

when retrieving GO term size, include genes in children terms. (based on relationships in the GO DAG hierarchy). Defaults to TRUE

Details

Group terms which are at least within a similarity below 'threshold'. Decide which term remains based on a score. If no score is provided, then decide based on the "uniqueness" or the term "size".

Currently, rrvgo uses the similarity between pairs of terms to compute a distance matrix, defined as (1-simMatrix). The terms are then hierarchically clustered using complete linkage, and the tree is cut at the desired threshold, picking the term with the highest score as the representative of each group.

Therefore, higher thresholds lead to fewer groups, and the threshold should be read as the minimum similarity between group representatives.

Value

a data.frame identifying the different clusters of terms, the parent term representing the cluster, and some metrics of importance describing how unique and dispensable a term is.

Examples

go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")

imbforge/rrvgo documentation built on Oct. 24, 2024, 12:18 a.m.