findMsigClusters: Identify gene-set clusters from a gene-set overlap network

View source: R/genesetGroups.R

findMsigClustersR Documentation

Identify gene-set clusters from a gene-set overlap network

Description

This function identifies gene-set clusters from a gene-set overlap network produced using vissE. Various graph clustering algorithms from the igraph package can be used for clustering. Gene-set clusters identified are then sorted based on their size and a given statistic of interest (absolute of the statistic is maximised per cluster).

Usage

findMsigClusters(
  ig,
  genesetStat = NULL,
  minSize = 2,
  alg = igraph::cluster_walktrap,
  algparams = list()
)

Arguments

ig

an igraph object, containing a network of gene set overlaps computed using computeMsigNetwork().

genesetStat

a named numeric, containing statistics for each gene-set that are to be used in cluster prioritisation. If NULL, clusters are prioritised based on their size (number of gene-sets in them).

minSize

a numeric, stating the minimum size a cluster can be (default is 2).

alg

a function, from the igraph package that should be used to perform graph-clustering (default is igraph::cluster_walktrap). The function should produce a communities object.

algparams

a list, specifying additional parameters that are to be passed to the graph clustering algorithm.

Details

Gene-sets clusters are identified using graph clustering and are prioritised based on a combination of cluster size and optionally, a statistic of interest (e.g., enrichment scores). A product-of-ranks approach is used to prioritise clusters when gene-set statistics are available. In this approach, clusters are ranked based on their cluster size (largest to smallest) and on the median absolute statistic of gene-sets within it (largest to smallest). The product of these ranks is computed and clusters are ranked based on these product-of-rank statistic (smallest to largest).

When prioritising using cluster size and gene-set statistics, if statistics for some gene-sets in the network are missing, only the size is used in cluster prioritisation.

Value

a list, containing gene-sets that belong to each cluster. Items in the list are organised based on prioritisation.

Examples

data(hgsc)
ovlap <- computeMsigOverlap(hgsc, thresh = 0.25)
ig <- computeMsigNetwork(ovlap, hgsc)
findMsigClusters(ig)

DavisLaboratory/vissE documentation built on Jan. 31, 2024, 5:02 a.m.