KMeans: 'KMeans'

View source: R/kmeans.R

KMeansR Documentation

KMeans

Description

KMeans Cluster Analysis.

Usage

KMeans(
  data = NULL,
  centers = 2,
  centers.names = NULL,
  subset = NULL,
  weights = NULL,
  missing = "Use partial data",
  iter.max = 100,
  n.starts = 10,
  algorithm = "Batch",
  output = "Means",
  profile.var = NULL,
  seed = 1223,
  binary = FALSE,
  show.labels = FALSE,
  max.nchar.subtitle = 200,
  verbose = FALSE,
  ...
)

Arguments

data

A data.frame.

centers

Either the number of clusters (e.g., 2), or a set of initial cluster centers. Where the number of clusters is specified, or the algorithm is 'Bagging', a random selection of rows of data is chosen as the initial start points.

centers.names

An optional comma-separated list that will be used to name the predicted clusters.

subset

An optional vector specifying a subset of observations to be used in the fitting process, or, the name of a variable in data. It may not be an expression. subset may not

weights

An optional vector of sampling weights, or, the name or, the name of a variable in data. It may not be an expression.

missing

How missing data is to be treated in the regression. Options: "Error if missing data", "Exclude cases with missing data", "Use partial data". This is the default. "Imputation (replace missing values with estimates)".

iter.max

The number of iterations of the algorithm to run.

n.starts

The number of times the algorithm should be run, each time with a different number of start points.

algorithm

One of "Hartigan-Wong", "Forgy", "Lloyd", "MacQueen","Batch", or "Bagging".

output

The defaults is "Means". A table that is better for exporting is "Means table".

profile.var

An optional list of variables which will be compared against the KMeans predicted cluster.

seed

The random number seed used in imputation.

binary

Makes categorical variables into indicator variables (otherwise their values are used).

show.labels

Shows the variable labels, as opposed to the labels, in the outputs, where a variables label is an attribute (e.g., attr(foo, "label")).

max.nchar.subtitle

Maximum number of characters in the subtitle. This is used to determine the number of significant profiling variables to show.

verbose

Whether or not to show the verbose outputs to bclust. Defaults to false.

...

Additional arguments to bclust and SegmentComparisonTable.

Details

"Bagging" uses bagging in an attempt to find replicable custers. By default, 10 bootstrap samples are created (using weights if provided), and k-mean cluster analysis is used to find 20 clusters in each of these samples, and the complete-link hiearchical clustering algorithm is then used to form the final clusters (Leisch 1999). See bclust to see the names and descriptions of additional parameters. After running bclust, cases are assigned to the most similar cluster.

References

Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100-108. Leisch, Friedrich (1999) Bagged clustering. Working Paper 51, SFB "Adaptive Information Systems and Modeling in Economics and Management Science", August 1999. http://epub.wu.ac.at/1272/ 1/document.pdf Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128-137. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.


NumbersInternational/flipCluster documentation built on Feb. 26, 2024, 5:34 a.m.