computeGeneSetSimilarity: Compute the similarity/overlap between gene sets

Description Usage Arguments Value Examples

View source: R/geneSetAnalysis.R

Description

Given a table mapping gene sets to their component genes, compute the similarity between each pair of gene sets based on how many genes they share. Currently, only one similarity algorithm is implemented, which computes the ratio between the size of the intersection and the size of the union of a pair of gene sets.

Usage

1
computeGeneSetSimilarity(geneSets, similarityMetric = "overlap")

Arguments

geneSets

A data.table or data.frame with columns "gs_id" (gene set ID) and "ensembl_gene" (ENSEMBL gene ID) which lists the genes in each gene set (one row per gene)

similarityMetric

(optional) The type of similarity metric to compute. Currently, the only option is "overlap", which calculates the proportion of intersecting genes to total genes between each pair of gene sets (size(intersection(gs1, gs2)) / size(union(gs1, gs2))).

Value

A data.frame that lists the similarity score between each pair of gene sets. There will be three columns: "gs1" (gene set 1), "gs2" (gene set 2), and "similarity".

Examples

1
2
3
targetGeneSets <- getGeneSets("ENSG00000000971", "GO:BP")
geneSets <- expandGeneSets(targetGeneSets$gs_id, "GO:BP")
computeGeneSetSimilarity(geneSets)

EvgeniyaGorobets/PGxVision documentation built on Dec. 17, 2021, 7:26 p.m.