KMeans | R Documentation |
KMeans
KMeans Cluster Analysis.
KMeans(
data = NULL,
centers = 2,
centers.names = NULL,
subset = NULL,
weights = NULL,
missing = "Use partial data",
iter.max = 100,
n.starts = 10,
algorithm = "Batch",
output = "Means",
profile.var = NULL,
seed = 1223,
binary = FALSE,
show.labels = FALSE,
max.nchar.subtitle = 200,
verbose = FALSE,
...
)
data |
A |
centers |
Either the number of clusters (e.g., 2), or a set of initial cluster centers. Where the number of clusters is specified, or the algorithm is 'Bagging', a random selection of rows of data is chosen as the initial start points. |
centers.names |
An optional comma-separated list that will be used to name the predicted clusters. |
subset |
An optional vector specifying a subset of observations to be
used in the fitting process, or, the name of a variable in |
weights |
An optional vector of sampling weights, or, the name or, the
name of a variable in |
missing |
How missing data is to be treated in the regression. Options:
|
iter.max |
The number of iterations of the algorithm to run. |
n.starts |
The number of times the algorithm should be run, each time with a different number of start points. |
algorithm |
One of |
output |
The defaults is |
profile.var |
An optional list of variables which will be compared against the KMeans predicted cluster. |
seed |
The random number seed used in imputation. |
binary |
Makes categorical variables into indicator variables (otherwise their values are used). |
show.labels |
Shows the variable labels, as opposed to the labels, in the outputs, where a variables label is an attribute (e.g., attr(foo, "label")). |
max.nchar.subtitle |
Maximum number of characters in the subtitle. This is used to determine the number of significant profiling variables to show. |
verbose |
Whether or not to show the verbose outputs to |
... |
Additional arguments to |
"Bagging"
uses bagging in an attempt to find replicable custers.
By default, 10 bootstrap samples are created (using weights if provided), and k-mean
cluster analysis is used to find 20 clusters in each of these samples, and the complete-link
hiearchical clustering algorithm is then used to form the final clusters (Leisch 1999).
See bclust
to see the names and descriptions of additional parameters.
After running bclust
, cases are assigned to the most similar cluster.
Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100-108. Leisch, Friedrich (1999) Bagged clustering. Working Paper 51, SFB "Adaptive Information Systems and Modeling in Economics and Management Science", August 1999. http://epub.wu.ac.at/1272/ 1/document.pdf Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128-137. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.