kmeansGeneset: Cluster gene-sets by enrichment profiles with k-means...
In bedapub/ribiosGSEA: Gene-Set Enrichment Analysis Tools in Ribios

kmeansGeneset

R Documentation

Cluster gene-sets by enrichment profiles with k-means clustering, and select representative gene-sets by gene-set composition

Description

Cluster gene-sets by enrichment profiles with k-means clustering, and select representative gene-sets by gene-set composition

Usage

kmeansGeneset(
  enrichProfMatrix,
  genesetGenes,
  optK = pmin(25, floor(nrow(enrichProfMatrix)/2)),
  iter.max = 15,
  nstart = 50,
  thrCumJaccardIndex = 0.5,
  maxRepPerCluster = 10,
  metaClusterColumns = 1:ncol(enrichProfMatrix)
)

Arguments

`enrichProfMatrix`	A numeric matrix representing gene-set enrichment profile. Each row represent one gene-set and each column represent one enrichment profile, for instance a contrast in differential gene expression analysis. The values of the matrix represent enrichment of gene-sets, for instance enrichment score or absolute log10-transform p-values can be used. The row names are gene-set names.
`genesetGenes`	A list of character strings, each element being genes of a gene-set in the `enrichProfMatrix`. The names of the list must exactly match the row-names of `enrichProfMatrix`, namely the names of gene-sets in the same order.
`optK`	Integer, the number of initial clusters of gene-sets. Because one or more gene-sets may be selected from each gene-set cluster, the number of finally selected gene-sets is equal to or larger than `optK`.
`iter.max`	Integer, the maximum numbers of iterations allowed. This parameter is passed to `kmeans`.
`nstart`	Integer, how many random sets should be chosen to initialize cluster centers. This parameter is passed to `kmeans`.
`thrCumJaccardIndex`	Numeric, between 0 and 1, the threshold of cumulative Jaccard Index. The larger the value is, the more gene-sets will be selected from each cluster
`maxRepPerCluster`	Integer, maximum number of representative genesets per cluster. If NULL or NA, no limit is set.
`metaClusterColumns`	Columns used to cluster the clusters by their average enrichment profile. By default, all columns are used. This function performs `k-means` clustering of enrichment profiles of gene-sets. Within each cluster, we first identify the union set of unique genes covered any gene-set in the cluster, and then calculate Jaccard Index between genes in each gene-set and the union set. Gene-sets are sorted descendingly by the Jaccard Index, and the cumulative Jaccard Index is calculated. Among the sorted gene-sets, the gene-sets up to the position when the cumulative Jaccard Index exceeds `thrCumJaccardIndex` are selected (excluding redundant gene-sets). The geneset clusters are ordered by their average profiles - similar clusters are near to each other.

Value

A list:

kmeans Result object returned by kmeans.
genesetClusterData A data.frame with following columns: GenesetCluster, GenesetInd, GenesetName, JaccardIndex, CumJaccardIndex, IsRepresentative.
repGenesets Character vector, gene-set names that are selected as representative gene-sets from each gene-set clsuter.
gsCompOverlapSelInd Factor vector, indicating the gene-set clusters represented by each representative gene-set.

Examples

set.seed(1887)
profMat <- matrix(rnorm(100), nrow=20, 
    dimnames=list(sprintf("geneset%d", 1:20), sprintf("contrast%d", 1:5)))
gsGenes <- lapply(1:nrow(profMat), function(x) 
    unique(sample(LETTERS, 10, replace=TRUE)))
names(gsGenes) <- rownames(profMat)
kmeansGeneset(profMat, gsGenes, optK=5)

bedapub/ribiosGSEA documentation built on March 30, 2023, 3:26 p.m.