BatchKMeans: BatchKMeans
In Displayr/flipCluster: Cluster analysis

View source: R/kmeans.R

BatchKMeans

R Documentation

BatchKMeans

Description

Uses the batch method k-means method for forming clusters.

Usage

BatchKMeans(x, centers, weights, iter.max, n.starts, seed = 1223)

Arguments

`x`	A `data.frame`.
`centers`	Either the number of clusters (e.g., 2), or a set of initial cluster centers.
`weights`	An optional vector of sampling weights, or, the name or, the name of a variable in `data`. It may not be an expression.
`iter.max`	The number of iterations of the algorithm.
`n.starts`	The number of times to run the whole algorihtm.
`seed`	The seed for the random number generator.

Details

The batch method works by selecting initial cluster centers, allocating each observation to the closest cluster, recomputing the cluster centers, and repeating these steps until the either the residual sum of squares stops reducing, or, iter.max is exceeded.

The two novel features of this algorithm, relative to traditional k-means algorithms such as Hartigan and Wong (1979) are: (1) The algorithms addresses weights. (2) The algorithm classifies cases that have incomplete data.

The algorithm starts by initially assigning cases to clusters as follows: (1) Cases with missing values are removed. (2) If the data is weighted, a new 'bootstrapped' sample is created via resampling. (3) The Hartigan-Wong algorithm is applied to the bootstrapped sample. (4) Each of the cases in the data set (including those with partially missing data) are assigned to the closest cluster center.

The algorithm then repeatedly: (1) Recomputes the clustered center as the weighted mean of the data for the cases assigned to the cluster. (2) Assigns cases to the closest cluster. This proceeds until the cluster membership stabilizes or the number of iterations is exceeded.

Where n.starts is greater than 1, or, there are less than 100 cases left after removing cases with incomplete data, the remaining start points are selected by: (1) identifying unique cases, and (2) sampling without replacement from amongst the unique cases.

Displayr/flipCluster documentation built on June 2, 2025, 11:49 a.m.