kknn: Weighted k-Nearest Neighbor Classifier

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Performs k-nearest neighbor classification of a test set using a training set. For each row of the test set, the k nearest training set vectors (according to Minkowski distance) are found, and the classification is done via the maximum of summed kernel densities. In addition even ordinal and continuous variables can be predicted.

Usage

1
2
3
4
kknn(formula = formula(train), train, test, na.action = na.omit(), 
	k = 7, distance = 2, kernel = "optimal", ykernel = NULL, scale=TRUE,
	contrasts = c('unordered' = "contr.dummy", ordered = "contr.ordinal"))
kknn.dist(learn, valid, k = 10, distance = 2)     

Arguments

formula

A formula object.

train

Matrix or data frame of training set cases.

test

Matrix or data frame of test set cases.

learn

Matrix or data frame of training set cases.

valid

Matrix or data frame of test set cases.

na.action

A function which indicates what should happen when the data contain 'NA's.

k

Number of neighbors considered.

distance

Parameter of Minkowski distance.

kernel

Kernel to use. Possible choices are "rectangular" (which is standard unweighted knn), "triangular", "epanechnikov" (or beta(2,2)), "biweight" (or beta(3,3)), "triweight" (or beta(4,4)), "cos", "inv", "gaussian", "rank" and "optimal".

ykernel

Window width of an y-kernel, especially for prediction of ordinal classes.

scale

logical, scale variable to have equal sd.

contrasts

A vector containing the 'unordered' and 'ordered' contrasts to use.

Details

This nearest neighbor method expands knn in several directions. First it can be used not only for classification, but also for regression and ordinal classification. Second it uses kernel functions to weight the neighbors according to their distances. In fact, not only kernel functions but every monotonic decreasing function f(x) for all x>0 will work fine.

The number of neighbours used for the "optimal" kernel should be [ (2(d+4)/(d+2))^(d/(d+4)) k ], where k is the number that would be used for unweighted knn classification, i.e. kernel="rectangular". This factor (2(d+4)/(d+2))^(d/(d+4)) is between 1.2 and 2 (see Samworth (2012) for more details).

Value

kknn returns a list-object of class kknn including the components

fitted.values

Vector of predictions.

CL

Matrix of classes of the k nearest neighbors.

W

Matrix of weights of the k nearest neighbors.

D

Matrix of distances of the k nearest neighbors.

C

Matrix of indices of the k nearest neighbors.

prob

Matrix of predicted class probabilities.

response

Type of response variable, one of continuous, nominal or ordinal.

distance

Parameter of Minkowski distance.

call

The matched call.

terms

The 'terms' object used.

Author(s)

Klaus P. Schliep klaus.schliep@gmail.com
Klaus Hechenbichler

References

Hechenbichler K. and Schliep K.P. (2004) Weighted k-Nearest-Neighbor Techniques and Ordinal Classification, Discussion Paper 399, SFB 386, Ludwig-Maximilians University Munich (http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper399.ps)

Hechenbichler K. (2005) Ensemble-Techniken und ordinale Klassifikation, PhD-thesis

Samworth, R.J. (2012) Optimal weighted nearest neighbour classifiers. Annals of Statistics, 40, 2733-2763. (avaialble from http://www.statslab.cam.ac.uk/~rjs57/Research.html)

See Also

train.kknn, simulation, knn and knn1

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
library(kknn)

data(iris)
m <- dim(iris)[1]
val <- sample(1:m, size = round(m/3), replace = FALSE, 
	prob = rep(1/m, m)) 
iris.learn <- iris[-val,]
iris.valid <- iris[val,]
iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1,
	kernel = "triangular")
summary(iris.kknn)
fit <- fitted(iris.kknn)
table(iris.valid$Species, fit)
pcol <- as.character(as.numeric(iris.valid$Species))
pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red")
	[(iris.valid$Species != fit)+1])

data(ionosphere)
ionosphere.learn <- ionosphere[1:200,]
ionosphere.valid <- ionosphere[-c(1:200),]
fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid)
table(ionosphere.valid$class, fit.kknn$fit)
(fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, 
	kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1))
table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class)
(fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, 
	kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2))
table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class)

Example output

Call:
kknn(formula = Species ~ ., train = iris.learn, test = iris.valid,     distance = 1, kernel = "triangular")

Response: "nominal"
          fit prob.setosa prob.versicolor prob.virginica
1   virginica           0      0.00000000   1.0000000000
2   virginica           0      0.00000000   1.0000000000
3  versicolor           0      0.76818715   0.2318128545
4      setosa           1      0.00000000   0.0000000000
5      setosa           1      0.00000000   0.0000000000
6  versicolor           0      1.00000000   0.0000000000
7      setosa           1      0.00000000   0.0000000000
8      setosa           1      0.00000000   0.0000000000
9   virginica           0      0.00000000   1.0000000000
10  virginica           0      0.00000000   1.0000000000
11  virginica           0      0.00000000   1.0000000000
12  virginica           0      0.00000000   1.0000000000
13     setosa           1      0.00000000   0.0000000000
14 versicolor           0      1.00000000   0.0000000000
15 versicolor           0      0.65464129   0.3453587086
16 versicolor           0      1.00000000   0.0000000000
17  virginica           0      0.08466768   0.9153323224
18     setosa           1      0.00000000   0.0000000000
19  virginica           0      0.00000000   1.0000000000
20  virginica           0      0.48306401   0.5169359867
21 versicolor           0      1.00000000   0.0000000000
22     setosa           1      0.00000000   0.0000000000
23 versicolor           0      0.80951769   0.1904823088
24     setosa           1      0.00000000   0.0000000000
25  virginica           0      0.00000000   1.0000000000
26     setosa           1      0.00000000   0.0000000000
27  virginica           0      0.00107025   0.9989297498
28  virginica           0      0.00000000   1.0000000000
29 versicolor           0      1.00000000   0.0000000000
30 versicolor           0      0.99975680   0.0002432021
31  virginica           0      0.00000000   1.0000000000
32     setosa           1      0.00000000   0.0000000000
33 versicolor           0      1.00000000   0.0000000000
34     setosa           1      0.00000000   0.0000000000
35 versicolor           0      1.00000000   0.0000000000
36  virginica           0      0.15242361   0.8475763910
37 versicolor           0      1.00000000   0.0000000000
38 versicolor           0      0.99977539   0.0002246104
39  virginica           0      0.17671325   0.8232867528
40     setosa           1      0.00000000   0.0000000000
41  virginica           0      0.40918541   0.5908145918
42  virginica           0      0.19425741   0.8057425906
43 versicolor           0      0.65623167   0.3437683255
44     setosa           1      0.00000000   0.0000000000
45     setosa           1      0.00000000   0.0000000000
46 versicolor           0      0.55421691   0.4457830880
47     setosa           1      0.00000000   0.0000000000
48  virginica           0      0.00000000   1.0000000000
49 versicolor           0      0.58263261   0.4173673895
50     setosa           1      0.00000000   0.0000000000
            fit
             setosa versicolor virginica
  setosa         16          0         0
  versicolor      0         13         1
  virginica       0          3        17
   
      b   g
  b  19   8
  g   2 122

Call:
train.kknn(formula = class ~ ., data = ionosphere.learn, kmax = 15,     distance = 1, kernel = c("triangular", "rectangular", "epanechnikov",         "optimal"))

Type of response variable: nominal
Minimal misclassification: 0.12
Best kernel: rectangular
Best k: 2
   
      b   g
  b  25   4
  g   2 120

Call:
train.kknn(formula = class ~ ., data = ionosphere.learn, kmax = 15,     distance = 2, kernel = c("triangular", "rectangular", "epanechnikov",         "optimal"))

Type of response variable: nominal
Minimal misclassification: 0.12
Best kernel: rectangular
Best k: 2
   
      b   g
  b  20   5
  g   7 119

kknn documentation built on May 2, 2019, 3:26 a.m.