KMeans | R Documentation |
This is a wrapper around the Python class sklearn.cluster.KMeans.
rgudhi::PythonClass
-> rgudhi::SKLearnClass
-> rgudhi::BaseClustering
-> KMeans
new()
The KMeans class constructor.
KMeans$new( n_clusters = 2L, init = c("k-means++", "random"), n_init = 10L, max_iter = 300L, tol = 1e-04, verbose = 0L, random_state = NULL, copy_x = TRUE, algorithm = c("lloyd", "elkan") )
n_clusters
An integer value specifying the number of clusters to
form as well as the number of centroids to generate. Defaults to 2L
.
init
Either a string or a numeric matrix of shape
\mathrm{n_{clusters}} \times \mathrm{n_{features}}
specifying the
method for initialization. If a string, choices are:
"k-means++"
: selects initial cluster centroids using sampling based
on an empirical probability distribution of the points’ contribution to
the overall inertia. This technique speeds up convergence, and is
theoretically proven to be \mathcal{O}(\log(k))
-optimal. See the
description of n_init
for more details;
"random"
: chooses n_clusters
observations (rows) at random from
data for the initial centroids.
Defaults to "k-means++"
.
n_init
An integer value specifying the number of times the k-means
algorithm will be run with different centroid seeds. The final results
will be the best output of n_init
consecutive runs in terms of
inertia. Defaults to 10L
.
max_iter
An integer value specifying the maximum number of
iterations of the k-means algorithm for a single run. Defaults to
300L
.
tol
A numeric value specifying the relative tolerance with regards
to Frobenius norm of the difference in the cluster centers of two
consecutive iterations to declare convergence. Defaults to 1e-4
.
verbose
An integer value specifying the level of verbosity.
Defaults to 0L
which is equivalent to no verbose.
random_state
An integer value specifying the initial seed of the
random number generator. Defaults to NULL
which uses the current
timestamp.
copy_x
A boolean value specifying whether the original data is to
be modified. When pre-computing distances it is more numerically
accurate to center the data first. If copy_x
is TRUE
, then the
original data is not modified. If copy_x
is FALSE
, the original
data is modified, and put back before the function returns, but small
numerical differences may be introduced by subtracting and then adding
the data mean. Note that if the original data is not C-contiguous, a
copy will be made even if copy_x
is FALSE
. If the original data is
sparse, but not in CSR format, a copy will be made even if copy_x
is
FALSE
. Defaults to TRUE
.
algorithm
A string specifying the k-means algorithm to use. The
classical EM-style algorithm is "lloyd"
. The "elkan"
variation can
be more efficient on some datasets with well-defined clusters, by using
the triangle inequality. However it’s more memory-intensive due to the
allocation of an extra array of shape \mathrm{n_{samples}} \times
\mathrm{n_{clusters}}
. Defaults to "lloyd"
.
An object of class KMeans.
clone()
The objects of this class are cloneable with this method.
KMeans$clone(deep = FALSE)
deep
Whether to make a deep clone.
cl <- KMeans$new()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.