gapStats: Unbiased estimate of the number of cell or gene clusters...

Description Usage Arguments Value

Description

Takes ExpressionSet object and calculates the optimal number of kmeans, pam, or hierarchical clusters for the samples or genes using the gap statistic.

Usage

1
2
gapStats(cellData, gene_clust = FALSE, fun = "kmeans", max_clust = 25,
  boot = 100, plot = TRUE, save = FALSE, print = TRUE)

Arguments

cellData

ExpressionSet object created with readCells (and preferably transformed with prepCells). It is also helpful to first run reduceGenes_var and reduceGenes_pca.

gene_clust

Boolean specifying whether the gap statistic should be calculated for the samples or genes. TRUE calculates for the cells, FALSE for the genes.

fun

Character string specifying whether the gap statistic should be calculated for kmeans, pam, or hierarchical clustering. Possible values are kmeans, pam, or hclust. clustering methods to perform. All three can be specified, or a subset of the three.

max_clust

Integer specifying the maximum possible number of clusters in the dataset. Set higher than the expected value. matrix for 'hierarchical.' Equivalent to the 'method' parameter within the dist function.

boot

Integer specifying the number of bootstrap iterations to perform when calculating the gap statistic. 'hierarchical.' Equivalent to the 'method' parameter within the hclust function.

plot

Boolean specifying whether a plot of the gap values vs the number of clusters should be produced.

save

Boolean specifying whether the plot should be saved.

print

Boolean specifying whether the optimal number of clusters should be printed in the terminal window.

Value

The optimal number of clusters calculated from the gap statistic with the given parameters. A new column is added to pData indicating the optimal number of cell or gene clusters for the chosen clustering method.


joeburns06/hocuspocus documentation built on May 19, 2019, 2:59 p.m.