mbkmeans | R Documentation |
This is an implementation of the mini-batch k-means algorithm of Sculley (2010) for large single cell sequencing data with the dimensionality reduction results as input in the reducedDim() slot.
mbkmeans(x, ...) ## S4 method for signature 'SummarizedExperiment' mbkmeans(x, whichAssay = 1, ...) ## S4 method for signature 'SingleCellExperiment' mbkmeans(x, reduceMethod = "PCA", whichAssay = 1, ...) ## S4 method for signature 'LinearEmbeddingMatrix' mbkmeans(x, ...) ## S4 method for signature 'ANY' mbkmeans( x, clusters, batch_size = min(500, NCOL(x)), max_iters = 100, num_init = 1, init_fraction = batch_size/NCOL(x), initializer = "kmeans++", compute_labels = TRUE, calc_wcss = FALSE, early_stop_iter = 10, verbose = FALSE, CENTROIDS = NULL, tol = 1e-04, BPPARAM = BiocParallel::SerialParam(), ... )
x |
The object on which to run mini-batch k-means. It can be a matrix-like object (e.g., matrix, Matrix, DelayedMatrix, HDF5Matrix) with genes in the rows and samples in the columns. Specialized methods are defined for SummarizedExperiment and SingleCellExperiment. |
... |
passed to 'blockApply'. |
whichAssay |
The assay to use as input to mini-batch k-means. If x is a
SingleCellExperiment, this is ignored unless |
reduceMethod |
Name of dimensionality reduction results to use as input to mini-batch k-means. Set to NA to use the full matrix. |
clusters |
the number of clusters |
batch_size |
the size of the mini batches. By default, it equals the minimum between the number of observations and 500. |
max_iters |
the maximum number of clustering iterations |
num_init |
number of times the algorithm will be run with different centroid seeds |
init_fraction |
proportion of data to use for the initialization centroids (applies if initializer is kmeans++ ). Should be a float number between 0.0 and 1.0. By default, it uses the relative batch size. |
initializer |
the method of initialization. One of kmeans++ and random. See details for more information |
compute_labels |
logcical indicating whether to compute the final cluster labels. |
calc_wcss |
logical indicating whether the per-cluster WCSS is computed. Ignored if 'compute_labels = FALSE'. |
early_stop_iter |
continue that many iterations after calculation of the best within-cluster-sum-of-squared-error |
verbose |
either TRUE or FALSE, indicating whether progress is printed during clustering |
CENTROIDS |
a matrix of initial cluster centroids. The rows of the CENTROIDS matrix should be equal to the number of clusters and the columns should be equal to the columns of the data |
tol |
a float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) 'tol' is greater than the squared norm of the centroids, then kmeans has converged |
BPPARAM |
See the 'BiocParallel' package. Only the label assignment is done in parallel. |
The implementation is largely based on the
MiniBatchKmeans
function of the ClusterR
package. The contribution of this package is to provide support for on-disk
data representations such as HDF5, through the use of DelayedMatrix
and HDF5Matrix
objects, as well as for sparse data representation
through the classes of the Matrix
package. We also provide
high-level methods for objects of class SummarizedExperiment
,
SingleCellExperiment
, and LinearEmbeddingMatrix
.
This function performs k-means clustering using mini batches.
kmeans++: kmeans++ initialization. Reference : http://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf AND http://stackoverflow.com/questions/5466323/how-exactly-does-k-means-work
random: random selection of data rows as initial centroids
A list with the following attributes: centroids, WCSS_per_cluster, best_initialization, iters_per_initialization.
a list with the following attributes: centroids, WCSS_per_cluster, best_initialization, iters_per_initialization
Lampros Mouselimis and Yuwei Ni
Sculley. Web-Scale K-Means Clustering. WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA. ACM 978-1-60558-799-8/10/04.
https://github.com/mlampros/ClusterR
library(SummarizedExperiment) se <- SummarizedExperiment(matrix(rnorm(100), ncol=10)) mbkmeans(se, clusters = 2) library(SingleCellExperiment) sce <- SingleCellExperiment(matrix(rnorm(100), ncol=10)) mbkmeans(sce, clusters = 2, reduceMethod = NA) x<-matrix(rnorm(100), ncol=10) mbkmeans(x,clusters = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.