reduceSimMatrix: reduceSimMatrix Reduce a set of GO terms based on their...

Description Usage Arguments Details Value Examples

View source: R/rrvgo.R

Description

reduceSimMatrix Reduce a set of GO terms based on their semantic similarity and scores.

Usage

1
reduceSimMatrix(simMatrix, scores = NULL, threshold = 0.7, orgdb)

Arguments

simMatrix

a (square) similarity matrix

scores

*named* vector with scores (weights) assigned to each term. Higher is better. Can be NULL (default, means no scores. In this case, a default score based on set size is assigned, thus favoring larger sets). Note: if you have p-values as scores, consider -1*log-transforming them ('-log(p)')

threshold

similarity threshold (0-1). Some guidance: Large (allowed similarity=0.9), Medium (0.7), Small (0.5), Tiny (0.4) Defaults to Medium (0.7)

orgdb

one of org.* Bioconductor packages (the package name, or the orgdb object itself)

Details

Currently, rrvgo uses the similarity between pairs of terms to compute a distance matrix, defined as (1-simMatrix). The terms are then hierarchically clustered using complete linkage, and the tree is cut at the desired threshold, picking the term with the highest score as the representative of each group.

Therefore, higher thresholds lead to fewer groups, and the threshold should be read as the expected similarity of terms within a group (though this is not entirely correct, and you'll see similarities below this threshold being put in the same group).

Value

a data.frame with all terms and it's "reducer" (NA if the term was not reduced)

Examples

1
2
3
4
go_analysis <- read.delim(system.file("extdata/example.txt", package="rrvgo"))
simMatrix <- calculateSimMatrix(go_analysis$ID, orgdb="org.Hs.eg.db", ont="BP", method="Rel")
scores <- setNames(-log10(go_analysis$qvalue), go_analysis$ID)
reducedTerms <- reduceSimMatrix(simMatrix, scores, threshold=0.7, orgdb="org.Hs.eg.db")

rrvgo documentation built on Nov. 8, 2020, 6:17 p.m.