kNN: k-Nearest Neighbour Classification

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function provides a formula interface to the existing knn() function of package class. On top of this type of convinient interface, the function also allows standardization of the given data.

Usage

1
kNN(form, train, test, stand = TRUE, stand.stats = NULL, ...)

Arguments

form

An object of the class formula describing the functional form of the classification model.

train

The data to be used as training set.

test

The data set for which we want to obtain the k-NN classification, i.e. the test set.

stand

A boolean indicating whether the training data should be previously normalized before obtaining the k-NN predictions (defaults to TRUE).

stand.stats

This argument allows the user to supply the centrality and spread statistics that will drive the standardization. If not supplied they will default to the statistics used in the function scale(). If supplied they should be a list with two components, each beig a vector with as many positions as there are columns in the data set. The first vector should contain the centrality statistics for each column, while the second vector should contain the spread statistc values.

...

Any other parameters that will be forward to the knn() function of package class.

Details

This function is essentially a convenience function that provides a formula-based interface to the already existing knn() function of package class. On top of this type of interface it also incorporates some facilities in terms of standardization of the data before the k-nearest neighbour classification algorithm is applied. This algorithm is based on the distances between observations, which are known to be very sensitive to different scales of the variables and thus the usefulness of standardization.

Value

The return value is the same as in the knn() function of package class. This is a factor of classifications of the test set cases.

Author(s)

Luis Torgo ltorgo@dcc.fc.up.pt

References

Torgo, L. (2016) Data Mining using R: learning with case studies, second edition, Chapman & Hall/CRC (ISBN-13: 978-1482234893).

http://ltorgo.github.io/DMwR2

See Also

knn, knn1, knn.cv

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## A small example with the IRIS data set
data(iris)

## Split in train + test set
idxs <- sample(1:nrow(iris),as.integer(0.7*nrow(iris)))
trainIris <- iris[idxs,]
testIris <- iris[-idxs,]

## A 3-nearest neighbours model with no standardization
nn3 <- kNN(Species ~ .,trainIris,testIris,stand=FALSE,k=3)

## The resulting confusion matrix
table(testIris[,'Species'],nn3)

## Now a 5-nearest neighbours model with standardization
nn5 <- kNN(Species ~ .,trainIris,testIris,stand=TRUE,k=5)

## The resulting confusion matrix
table(testIris[,'Species'],nn5)

ltorgo/DMwR2 documentation built on May 21, 2019, 8:41 a.m.