MbkmeansParam-class: Mini-batch k-means clustering

MbkmeansParam-classR Documentation

Mini-batch k-means clustering

Description

Run the mini-batch k-means mbkmeans function with the specified number of centers within clusterRows. This sacrifices some accuracy for speed compared to the standard k-means algorithm. Note that this requires installation of the mbkmeans package.

Usage

MbkmeansParam(
  centers,
  batch_size = NULL,
  max_iters = 100,
  num_init = 1,
  init_fraction = NULL,
  initializer = "kmeans++",
  calc_wcss = FALSE,
  early_stop_iter = 10,
  tol = 1e-04,
  BPPARAM = SerialParam()
)

## S4 method for signature 'ANY,MbkmeansParam'
clusterRows(x, BLUSPARAM, full = FALSE)

Arguments

centers

An integer scalar specifying the number of centers. Alternatively, a function that takes the number of observations and returns the number of centers.

batch_size, max_iters, num_init, init_fraction, initializer, calc_wcss, early_stop_iter, tol, BPPARAM

Further arguments to pass to mbkmeans.

x

A numeric matrix-like object where rows represent observations and columns represent variables.

BLUSPARAM

A MbkmeansParam object.

full

Logical scalar indicating whether the full mini-batch k-means statistics should be returned.

Details

This class usually requires the user to specify the number of clusters beforehand. However, we can also allow the number of clusters to vary as a function of the number of observations. The latter is occasionally useful, e.g., to allow the clustering to automatically become more granular for large datasets.

To modify an existing MbkmeansParam object x, users can simply call x[[i]] or x[[i]] <- value where i is any argument used in the constructor.

For batch_size and init_fraction, a value of NULL means that the default arguments in the mbkmeans function signature are used. These defaults are data-dependent and so cannot be specified during construction of the MbkmeansParam object, but instead are defined within the clusterRows method.

Value

The MbkmeansParam constructor will return a MbkmeansParam object with the specified parameters.

The clusterRows method will return a factor of length equal to nrow(x) containing the cluster assignments. If full=TRUE, a list is returned with clusters (the factor, as above) and objects (a list containing mbkmeans, the direct output of mbkmeans).

Author(s)

Stephanie Hicks

See Also

mbkmeans from the mbkmeans package, which actually does all the heavy lifting.

KmeansParam, for dispatch to the standard k-means algorithm.

Examples

clusterRows(iris[,1:4], MbkmeansParam(centers=3))
clusterRows(iris[,1:4], MbkmeansParam(centers=3, batch_size=10))
clusterRows(iris[,1:4], MbkmeansParam(centers=3, init_fraction=0.5))

LTLA/bluster documentation built on Aug. 20, 2023, 5:39 a.m.