go_reduce: Reduce redundancy of human GO terms

View source: R/go_reduce.R

go_reduceR Documentation

Reduce redundancy of human GO terms

Description

This function will reduce GO redundancy first by creating a semantic similarity matrix (using GOSemSim::mgoSim), which is then passed through rrvgo::reduceSimMatrix(), which will reduce a set of GO terms based on their semantic similarity and scores (in this case, a default score based on set size is assigned.)

Usage

go_reduce(
  pathway_df,
  orgdb = "org.Hs.eg.db",
  threshold = 0.7,
  scores = NULL,
  measure = "Wang"
)

Arguments

pathway_df

a data.frame or tibble object, with the following columns:

  • go_type: the sub-ontology the GO term relates to. Should be one of c("BP", "CC", "MF").

  • go_id: the gene ontology identifier (e.g. GO:0016209)

orgdb

character() vector, indicating name of the org.* Bioconductor package to be used

threshold

numeric() vector. Similarity threshold (0-1) for rrvgo::reduceSimMatrix(). Default option is 0.7. Some guidance:

  • For large term groupings, use threshold = 0.9

  • For medium term groupings, use threshold = 0.7

  • For small term groupings, use threshold = 0.5

  • For tiny term groupings, use threshold = 0.4

scores

named vector, with scores (weights) assigned to each term. Higher is better. Can be NULL (default, means no scores. In this case, a default score based on set size is assigned, thus favoring larger sets). Note: if you have p-values as scores, consider log-transforming them (-log10(p)).

measure

character() vector, indicating method to be used to calculate semantic similarity measure. Must be one of the methods supported by GOSemSim: c("Resnik", "Lin", "Rel", "Jiang", "Wang"). Default is "Wang".

Details

Semantic similarity is calculated using the "Wang" method, a graph-based strategy to compute semantic similarity using the topology of the GO graph structure. GOSemSim::mgoSim does permit use of other measures (primarily information-content measures), but "Wang" is used as the default in GOSemSim (and was, thus, used as the default here). If you wish to use a different measure, please refer to the GOSemSim documentation.

rrvgo::reduceSimMatrix() creates a distance matrix, defined as (1-simMatrix). The terms are then hierarchically clustered using complete linkage (an agglomerative, or "bottom-up" clustering approach), and the tree is cut at the desired threshold. The term with the highest "score" is used to represent each group.

Value

a tibble object of pathway results, a "reduced" parent term to which pathways have been assigned. New columns:

  • parent_id: the GO ID of the parent term

  • parent_term: a description of the GO ID

  • parent_sim_score: the similarity score between the child GO term and its parent term

References

See Also

go_plot for plotting the output of go_reduce, GOSemSim::mgoSim for calculation of semantic similarity and rrvgo::reduceSimMatrix() for reduction of similarity matrix

Other GO-related functions: go_plot()

Examples

file_path <-
    system.file(
        "testdata",
        "go_test_data.txt",
        package = "rutils",
        mustWork = TRUE
    )

pathway_df <-
    readr::read_delim(file_path,
        delim = "\t"
    )

go_reduce(
    pathway_df = pathway_df,
    orgdb = "org.Hs.eg.db",
    threshold = 0.9,
    scores = NULL,
    measure = "Wang"
)

RHReynolds/rutils documentation built on March 26, 2022, 8:17 a.m.