View source: R/kmeansGenesets.R
kmeansGeneset | R Documentation |
Cluster gene-sets by enrichment profiles with k-means clustering, and select representative gene-sets by gene-set composition
kmeansGeneset(
enrichProfMatrix,
genesetGenes,
optK = pmin(25, floor(nrow(enrichProfMatrix)/2)),
iter.max = 15,
nstart = 50,
thrCumJaccardIndex = 0.5,
maxRepPerCluster = 10,
metaClusterColumns = 1:ncol(enrichProfMatrix)
)
enrichProfMatrix |
A numeric matrix representing gene-set enrichment profile. Each row represent one gene-set and each column represent one enrichment profile, for instance a contrast in differential gene expression analysis. The values of the matrix represent enrichment of gene-sets, for instance enrichment score or absolute log10-transform p-values can be used. The row names are gene-set names. |
genesetGenes |
A list of character strings, each element being genes of a gene-set in the |
optK |
Integer, the number of initial clusters of gene-sets. Because one or more gene-sets may be selected from each gene-set cluster, the number of finally selected gene-sets is equal to or larger than |
iter.max |
Integer, the maximum numbers of iterations allowed. This parameter is passed to |
nstart |
Integer, how many random sets should be chosen to initialize cluster centers. This parameter is passed to |
thrCumJaccardIndex |
Numeric, between 0 and 1, the threshold of cumulative Jaccard Index. The larger the value is, the more gene-sets will be selected from each cluster |
maxRepPerCluster |
Integer, maximum number of representative genesets per cluster. If NULL or NA, no limit is set. |
metaClusterColumns |
Columns used to cluster the clusters by their average enrichment profile. By default, all columns are used. This function performs The geneset clusters are ordered by their average profiles - similar clusters are near to each other. |
A list:
kmeans Result object returned by kmeans
.
genesetClusterData A data.frame
with following columns: GenesetCluster
, GenesetInd
, GenesetName
, JaccardIndex
, CumJaccardIndex
, IsRepresentative
.
repGenesets Character vector, gene-set names that are selected as representative gene-sets from each gene-set clsuter.
gsCompOverlapSelInd Factor vector, indicating the gene-set clusters represented by each representative gene-set.
set.seed(1887)
profMat <- matrix(rnorm(100), nrow=20,
dimnames=list(sprintf("geneset%d", 1:20), sprintf("contrast%d", 1:5)))
gsGenes <- lapply(1:nrow(profMat), function(x)
unique(sample(LETTERS, 10, replace=TRUE)))
names(gsGenes) <- rownames(profMat)
kmeansGeneset(profMat, gsGenes, optK=5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.