MiniBatchKMeans | R Documentation |
This is a wrapper around the Python class sklearn.cluster.MiniBatchKMeans.
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> MiniBatchKMeans
new()
The MiniBatchKMeans class constructor.
MiniBatchKMeans$new( n_clusters = 2L, init = c("k-means++", "random"), n_init = 10L, max_iter = 300L, tol = 1e-04, verbose = 0L, random_state = NULL, batch_size = 1024L, compute_labels = TRUE, max_no_improvement = 10L, init_size = NULL, reassignment_ratio = 0.01 )
n_clusters
An integer value specifying the number of clusters to
form as well as the number of centroids to generate. Defaults to 2L
.
init
Either a string or a numeric matrix of shape
\mathrm{n_clusters} \times \mathrm{n_features}
specifying the
method for initialization. If a string, choices are:
"k-means++"
: selects initial cluster centroids using sampling based
on an empirical probability distribution of the points’ contribution to
the overall inertia. This technique speeds up convergence, and is
theoretically proven to be \mathcal{O}(\log(k))
-optimal. See the
description of n_init
for more details;
"random"
: chooses n_clusters
observations (rows) at random from
data for the initial centroids.
Defaults to "k-means++"
.
n_init
An integer value specifying the number of times the k-means
algorithm will be run with different centroid seeds. The final results
will be the best output of n_init
consecutive runs in terms of
inertia. Defaults to 10L
.
max_iter
An integer value specifying the maximum number of
iterations of the k-means algorithm for a single run. Defaults to
300L
.
tol
A numeric value specifying the relative tolerance with regards
to Frobenius norm of the difference in the cluster centers of two
consecutive iterations to declare convergence. Defaults to 1e-4
.
verbose
An integer value specifying the level of verbosity.
Defaults to 0L
which is equivalent to no verbose.
random_state
An integer value specifying the initial seed of the
random number generator. Defaults to NULL
which uses the current
timestamp.
batch_size
An integer value specifying the size of the
mini-batches. For faster computations, you can set the batch_size
greater than 256 * number of cores to enable parallelism on all cores.
Defaults to 1024L
.
compute_labels
A boolean value specifying whether to compute label
assignment and inertia for the complete dataset once the minibatch
optimization has converged in fit. Defaults to TRUE
.
max_no_improvement
An integer value specifying how many
consecutive mini batches that does not yield an improvement on the
smoothed inertia should be used to call off the algorithm. To disable
convergence detection based on inertia, set max_no_improvement
to
NULL
. Defaults to 10L
.
init_size
An integer value specifying the number of samples to
randomly sample for speeding up the initialization (sometimes at the
expense of accuracy): the only algorithm is initialized by running a
batch KMeans on a random subset of the data. This needs to be larger
than n_clusters
. If NULL
, the heuristic is init_size = 3 * batch_size
if 3 * batch_size < n_clusters
, else init_size = 3 * n_clusters
. Defaults to NULL
.
reassignment_ratio
A numeric value specifying the fraction of the
maximum number of counts for a center to be reassigned. A higher value
means that low count centers are more easily reassigned, which means
that the model will take longer to converge, but should converge in a
better clustering. However, too high a value may cause convergence
issues, especially with a small batch size. Defaults to 0.01
.
An object of class MiniBatchKMeans.
clone()
The objects of this class are cloneable with this method.
MiniBatchKMeans$clone(deep = FALSE)
deep
Whether to make a deep clone.
cl <- MiniBatchKMeans$new()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.