kmc: K-Means Classification

Description Usage Arguments Details Value References See Also Examples

View source: R/kmc.R

Description

Classification based on K-means clustering within the classes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
kmc(x, ...)

## S3 method for class 'formula'
kmc(formula, data, ..., subset, na.action)

## S3 method for class 'data.frame'
kmc(x, ...)

## S3 method for class 'matrix'
kmc(x, grouping, ..., subset, na.action = na.fail)

## Default S3 method:
kmc(x, grouping, K = 2, wf = c("biweight", "cauchy",
  "cosine", "epanechnikov", "exponential", "gaussian", "optcosine",
  "rectangular", "triangular"), bw, k, nn.only = TRUE, nstart = 1, ...)

update.kmc(object, wf = c("biweight", "cauchy", "cosine", "epanechnikov",
  "exponential", "gaussian", "optcosine", "rectangular", "triangular"), bw, k,
  nn.only, ...)

Arguments

x

(Required if no formula is given as principal argument.) A matrix or data.frame or Matrix containing the explanatory variables.

formula

A formula of the form groups ~ x1 + x2 + ..., that is, the response is the grouping factor and the right hand side specifies the (usually non-factor) discriminators.

data

A data.frame from which variables specified in formula are to be taken.

subset

An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)

na.action

A function to specify the action to be taken if NAs are found. The default action is first the na.action setting of options and second na.fail if that is unset. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

grouping

(Required if no formula is given as principal argument.) A factor specifying the class membership for each observation.

K

The number of prototypes per class, either a single number or a vector of length equal to the number of classes. The numbers of centers have to be in the same order as the levels of grouping. Default is K = 2.

wf

A window function which is used to calculate weights that are introduced into the fitting process. Either a character string or a function, e.g. wf = function(x) exp(-x). For details see the documentation for wfs.

bw

(Required only if wf is a string.) The bandwidth parameter of the window function. (See wfs.)

k

(Required only if wf is a string.) The number of nearest neighbors of the decision boundary to be used in the fitting process. (See wfs.)

nn.only

(Required only if wf is a string indicating a window function with infinite support and if k is specified.) Should only the k nearest neighbors or all observations receive positive weights? (See wfs.)

nstart

The number of random starts of the K-means algorithm. See kmeans.

object

An object of class "kmc".

...

Further arguments to be passed to kmeans.

Details

Prototype methods represent the training data by a set of points in feature space. Each prototype has an associated class label, and classification of a query point x is made to the class of the closest prototype.

In order to use K-means clustering for classification of labeled data K-means clustering is applied to the training data of each class separately, using the number of prototypes per class which is specified by K and defaults to 2.

Usually for a test observation the class label of the closest prototype is predicted. But it is also possible to use more than 1 prototype and to weigh the influence of the prototypes on the classification according to their distances from the observation to be classified. This is controlled by the arguments wf, k, bw and nn.only (see wfs).

The name of the window function (wf) can be specified as a character string. In this case the window function is generated internally in kmc. Currently supported are "biweight", "cauchy", "cosine", "epanechnikov", "exponential", "gaussian", "optcosine", "rectangular" and "triangular".

Moreover, it is possible to generate the window functions mentioned above in advance (see wfs) and pass them to kmc.

Any other function implementing a window function can also be used as wf argument. This allows the user to try own window functions. See help on wfs for details.

It may be useful to scale the data first.

If the predictor variables include factors, the formula interface must be used in order to get a correct model matrix.

Value

An object of class "kmc" containing the following components:

counts

The number of observations per class.

x

A matrix of prototypes.

grouping

A factor specifying the class membership for each prototype.

lev

The class labels (the levels of grouping).

N

The number of training observations.

K

The (used) number of prototypes per class.

call

The (matched) function call.

References

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2001.

See Also

predict.kmc, kmeans.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# generate waveform data
library(mlbench)
data.train <- as.data.frame(mlbench.waveform(300))

# 3 centers per class
object <- kmc(classes ~ ., data = data.train, K = 3, wf = "rectangular", k = 1)
object <- kmc(data.train[,-22], data.train$classes, K = 3, wf = "rectangular", k = 1)

# 2 centers in class 1, 3 centers in class 2, 4 centers in class 3 
object <- kmc(classes ~ ., data = data.train, K = c(2,3,4), wf = "rectangular", k = 1)
object <- kmc(data.train[,-22], data.train$classes, K = c(2,3,4), wf = "rectangular", k = 1)

schiffner/locClass documentation built on May 29, 2019, 3:39 p.m.