kmc: K-Means Classification
In schiffner/locClass: Collection of Local Classification Methods

Description Usage Arguments Details Value References See Also Examples

Classification based on K-means clustering within the classes.

kmc(x, ...)

## S3 method for class 'formula'
kmc(formula, data, ..., subset, na.action)

## S3 method for class 'data.frame'
kmc(x, ...)

## S3 method for class 'matrix'
kmc(x, grouping, ..., subset, na.action = na.fail)

## Default S3 method:
kmc(x, grouping, K = 2, wf = c("biweight", "cauchy",
  "cosine", "epanechnikov", "exponential", "gaussian", "optcosine",
  "rectangular", "triangular"), bw, k, nn.only = TRUE, nstart = 1, ...)

update.kmc(object, wf = c("biweight", "cauchy", "cosine", "epanechnikov",
  "exponential", "gaussian", "optcosine", "rectangular", "triangular"), bw, k,
  nn.only, ...)

`x`	(Required if no `formula` is given as principal argument.) A `matrix` or `data.frame` or `Matrix` containing the explanatory variables.
`formula`	A `formula` of the form `groups ~ x1 + x2 + ...`, that is, the response is the grouping `factor` and the right hand side specifies the (usually non-`factor`) discriminators.
`data`	A `data.frame` from which variables specified in `formula` are to be taken.
`subset`	An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
`na.action`	A function to specify the action to be taken if NAs are found. The default action is first the `na.action` setting of `options` and second `na.fail` if that is unset. An alternative is `na.omit`, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)
`grouping`	(Required if no `formula` is given as principal argument.) A `factor` specifying the class membership for each observation.
`K`	The number of prototypes per class, either a single number or a vector of length equal to the number of classes. The numbers of centers have to be in the same order as the levels of grouping. Default is `K = 2`.
`wf`	A window function which is used to calculate weights that are introduced into the fitting process. Either a character string or a function, e.g. `wf = function(x) exp(-x)`. For details see the documentation for `wfs`.
`bw`	(Required only if `wf` is a string.) The bandwidth parameter of the window function. (See `wfs`.)
`k`	(Required only if `wf` is a string.) The number of nearest neighbors of the decision boundary to be used in the fitting process. (See `wfs`.)
`nn.only`	(Required only if `wf` is a string indicating a window function with infinite support and if `k` is specified.) Should only the `k` nearest neighbors or all observations receive positive weights? (See `wfs`.)
`nstart`	The number of random starts of the K-means algorithm. See `kmeans`.
`object`	An object of class `"kmc"`.
`...`	Further arguments to be passed to `kmeans`.

Prototype methods represent the training data by a set of points in feature space. Each prototype has an associated class label, and classification of a query point x is made to the class of the closest prototype.

In order to use K-means clustering for classification of labeled data K-means clustering is applied to the training data of each class separately, using the number of prototypes per class which is specified by K and defaults to 2.

Usually for a test observation the class label of the closest prototype is predicted. But it is also possible to use more than 1 prototype and to weigh the influence of the prototypes on the classification according to their distances from the observation to be classified. This is controlled by the arguments wf, k, bw and nn.only (see wfs).

The name of the window function (wf) can be specified as a character string. In this case the window function is generated internally in kmc. Currently supported are "biweight", "cauchy", "cosine", "epanechnikov", "exponential", "gaussian", "optcosine", "rectangular" and "triangular".

Moreover, it is possible to generate the window functions mentioned above in advance (see wfs) and pass them to kmc.

Any other function implementing a window function can also be used as wf argument. This allows the user to try own window functions. See help on wfs for details.

It may be useful to scale the data first.

If the predictor variables include factors, the formula interface must be used in order to get a correct model matrix.

An object of class "kmc" containing the following components:

`counts`	The number of observations per class.
`x`	A `matrix` of prototypes.
`grouping`	A `factor` specifying the class membership for each prototype.
`lev`	The class labels (the levels of `grouping`).
`N`	The number of training observations.
`K`	The (used) number of prototypes per class.
`call`	The (matched) function call.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2001.

predict.kmc, kmeans.

# generate waveform data
library(mlbench)
data.train <- as.data.frame(mlbench.waveform(300))

# 3 centers per class
object <- kmc(classes ~ ., data = data.train, K = 3, wf = "rectangular", k = 1)
object <- kmc(data.train[,-22], data.train$classes, K = 3, wf = "rectangular", k = 1)

# 2 centers in class 1, 3 centers in class 2, 4 centers in class 3 
object <- kmc(classes ~ ., data = data.train, K = c(2,3,4), wf = "rectangular", k = 1)
object <- kmc(data.train[,-22], data.train$classes, K = c(2,3,4), wf = "rectangular", k = 1)