| KMeans | R Documentation |
KMeansKMeans Cluster Analysis.
KMeans(
data = NULL,
centers = 2,
centers.names = NULL,
subset = NULL,
weights = NULL,
missing = "Use partial data",
iter.max = 100,
n.starts = 10,
algorithm = "Batch",
output = "Means",
profile.var = NULL,
seed = 1223,
binary = FALSE,
show.labels = FALSE,
max.nchar.subtitle = 200,
verbose = FALSE,
...
)
data |
A |
centers |
Either the number of clusters (e.g., 2), or a set of initial cluster centers. Where the number of clusters is specified, or the algorithm is 'Bagging', a random selection of rows of data is chosen as the initial start points. |
centers.names |
An optional comma-separated list that will be used to name the predicted clusters. |
subset |
An optional vector specifying a subset of observations to be
used in the fitting process, or, the name of a variable in |
weights |
An optional vector of sampling weights, or, the name or, the
name of a variable in |
missing |
How missing data is to be treated in the regression. Options:
|
iter.max |
The number of iterations of the algorithm to run. |
n.starts |
The number of times the algorithm should be run, each time with a different number of start points. |
algorithm |
One of |
output |
The defaults is |
profile.var |
An optional list of variables which will be compared against the KMeans predicted cluster. |
seed |
The random number seed used in imputation. |
binary |
Makes categorical variables into indicator variables (otherwise their values are used). |
show.labels |
Shows the variable labels, as opposed to the labels, in the outputs, where a variables label is an attribute (e.g., attr(foo, "label")). |
max.nchar.subtitle |
Maximum number of characters in the subtitle. This is used to determine the number of significant profiling variables to show. |
verbose |
Whether or not to show the verbose outputs to |
... |
Additional arguments to |
"Bagging" uses bagging in an attempt to find replicable custers.
By default, 10 bootstrap samples are created (using weights if provided), and k-mean
cluster analysis is used to find 20 clusters in each of these samples, and the complete-link
hiearchical clustering algorithm is then used to form the final clusters (Leisch 1999).
See bclust to see the names and descriptions of additional parameters.
After running bclust, cases are assigned to the most similar cluster.
Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100-108. Leisch, Friedrich (1999) Bagged clustering. Working Paper 51, SFB "Adaptive Information Systems and Modeling in Economics and Management Science", August 1999. http://epub.wu.ac.at/1272/ 1/document.pdf Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128-137. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.