batchkmeans: Generic K-means Clustering
In yasomi: Yet Another Self Organising Map Implementation

Description Usage Arguments Details Value Author(s) See Also

Generic function to perform K-means clustering on some data.

1 2	batchkmeans(data, ncenters, init = c("prototypes", "random", "cluster"), prototypes, weights, max.iter, verbose = FALSE, keepdata = TRUE, ...)

`data`	the data to cluster. Acceptable data type depend on the available methods, see details
`ncenters`	the number of clusters
`init`	the initialisation method (see details)
`prototypes`	Initial values for the prototypes (the exact representation of the prototypes depends on the data type). If missing, initial prototypes are chosen via the method specified by the `init` parameter (see details)
`weights`	optional weights for the data points
`max.iter`	maximal number of iterations of the algorithm
`verbose`	switch for tracing the clustering process
`keepdata`	if `TRUE`, the original data are returned as part of the result object
`...`	additional arguments to be passed to methods

In yasomi, the batchkmeans generic function is implemented by two methods which provide K-means for two distinct data representation:

the default implementation batchkmeans.default is used when the dataset data is given by a matrix or a data frame: it provides a standard (batch) K-means implementation;
when the dataset is given as a kernel matrix (data is an object of class "kernelmatrix", see as.kernelmatrix), the method batchkmeans.kernelmatrix implements the (batch) kernel K-means algorithm. In this case, it is assumed that data contains all pairwise evaluation of a positive semi-definite kernel function and a batch K-means clustering is performed (implicitly) in the kernel induced feature space.

If the initial value of prototypes is not provided, it is obtained by one of the following method specified by the init parameter:

"prototypes": the standard method proceeds by choosing randomly a subset of the data of the requested size (with repetition if the grid size is larger than the data size). If the weights parameter is given, the probability of choosing a data point is proportionnal to its weight.
"random": the "random" method generate prototypes randomly and uniformly in the hypercube spanned by the data for standard Euclidean data. For dissimilarity data or for the Kernel data, the method generates prototypes via random convex combinations of the data points. In all cases, the optional weights are not taken into account by this method.
"cluster": the clustering initialisation method build a random partition the data into balanced clusters and uses as initial prototypes the centre of mass of those clusters. The optional weights are not taken into account for balancing the clusters.

An object of class "batchkmeans", a list with components including

`prototypes`	a representation of the prototypes that depends on the actual method
`classif`	a vector of integer indicating to which cluster each observation has been assigned
`errors`	a vector containing the evolution of the quantisation error during the fitting process
`data`	the original data if the function is called with `keepdata = TRUE`
`weights`	the weights of the data points if the function is called with `keepdata = TRUE` and if the `weights` is given

The list will generally contain additional components specific to each implementation. The returned object will also generally have another class more specific than "batchkmeans".

Fabrice Rossi

See batchsom for Self-Oganising Map which provides both clustering and visualisation.

yasomi documentation built on May 2, 2019, 5:59 p.m.