KMeans: 'KMeans'
In NumbersInternational/flipCluster: Cluster analysis

View source: R/kmeans.R

KMeans

R Documentation

`KMeans`

Description

KMeans Cluster Analysis.

Usage

KMeans(
  data = NULL,
  centers = 2,
  centers.names = NULL,
  subset = NULL,
  weights = NULL,
  missing = "Use partial data",
  iter.max = 100,
  n.starts = 10,
  algorithm = "Batch",
  output = "Means",
  profile.var = NULL,
  seed = 1223,
  binary = FALSE,
  show.labels = FALSE,
  max.nchar.subtitle = 200,
  verbose = FALSE,
  ...
)

Arguments

`data`	A `data.frame`.
`centers`	Either the number of clusters (e.g., 2), or a set of initial cluster centers. Where the number of clusters is specified, or the algorithm is 'Bagging', a random selection of rows of data is chosen as the initial start points.
`centers.names`	An optional comma-separated list that will be used to name the predicted clusters.
`subset`	An optional vector specifying a subset of observations to be used in the fitting process, or, the name of a variable in `data`. It may not be an expression. `subset` may not
`weights`	An optional vector of sampling weights, or, the name or, the name of a variable in `data`. It may not be an expression.
`missing`	How missing data is to be treated in the regression. Options: `"Error if missing data"`, `"Exclude cases with missing data"`, `"Use partial data"`. This is the default. `"Imputation (replace missing values with estimates)"`.
`iter.max`	The number of iterations of the algorithm to run.
`n.starts`	The number of times the algorithm should be run, each time with a different number of start points.
`algorithm`	One of `"Hartigan-Wong"`, `"Forgy"`, `"Lloyd"`, `"MacQueen"`,`"Batch"`, or `"Bagging"`.
`output`	The defaults is `"Means"`. A table that is better for exporting is `"Means table"`.
`profile.var`	An optional list of variables which will be compared against the KMeans predicted cluster.
`seed`	The random number seed used in imputation.
`binary`	Makes categorical variables into indicator variables (otherwise their values are used).
`show.labels`	Shows the variable labels, as opposed to the labels, in the outputs, where a variables label is an attribute (e.g., attr(foo, "label")).
`max.nchar.subtitle`	Maximum number of characters in the subtitle. This is used to determine the number of significant profiling variables to show.
`verbose`	Whether or not to show the verbose outputs to `bclust`. Defaults to false.
`...`	Additional arguments to `bclust` and `SegmentComparisonTable`.

Details

"Bagging" uses bagging in an attempt to find replicable custers. By default, 10 bootstrap samples are created (using weights if provided), and k-mean cluster analysis is used to find 20 clusters in each of these samples, and the complete-link hiearchical clustering algorithm is then used to form the final clusters (Leisch 1999). See bclust to see the names and descriptions of additional parameters. After running bclust, cases are assigned to the most similar cluster.

References

Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100-108. Leisch, Friedrich (1999) Bagged clustering. Working Paper 51, SFB "Adaptive Information Systems and Modeling in Economics and Management Science", August 1999. http://epub.wu.ac.at/1272/ 1/document.pdf Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128-137. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.

NumbersInternational/flipCluster documentation built on June 9, 2025, 8:42 a.m.