kNN: k-Nearest Neighbour Classification
In ltorgo/DMwR2: Functions and Data for the Second Edition of "Data Mining with R"

Description Usage Arguments Details Value Author(s) References See Also Examples

This function provides a formula interface to the existing knn() function of package class. On top of this type of convinient interface, the function also allows standardization of the given data.

1	kNN(form, train, test, stand = TRUE, stand.stats = NULL, ...)

`form`	An object of the class `formula` describing the functional form of the classification model.
`train`	The data to be used as training set.
`test`	The data set for which we want to obtain the k-NN classification, i.e. the test set.
`stand`	A boolean indicating whether the training data should be previously normalized before obtaining the k-NN predictions (defaults to TRUE).
`stand.stats`	This argument allows the user to supply the centrality and spread statistics that will drive the standardization. If not supplied they will default to the statistics used in the function `scale()`. If supplied they should be a list with two components, each beig a vector with as many positions as there are columns in the data set. The first vector should contain the centrality statistics for each column, while the second vector should contain the spread statistc values.
`...`	Any other parameters that will be forward to the `knn()` function of package `class`.

This function is essentially a convenience function that provides a formula-based interface to the already existing knn() function of package class. On top of this type of interface it also incorporates some facilities in terms of standardization of the data before the k-nearest neighbour classification algorithm is applied. This algorithm is based on the distances between observations, which are known to be very sensitive to different scales of the variables and thus the usefulness of standardization.

The return value is the same as in the knn() function of package class. This is a factor of classifications of the test set cases.

Luis Torgo ltorgo@dcc.fc.up.pt

Torgo, L. (2016) Data Mining using R: learning with case studies, second edition, Chapman & Hall/CRC (ISBN-13: 978-1482234893).

http://ltorgo.github.io/DMwR2

knn, knn1, knn.cv

## A small example with the IRIS data set
data(iris)

## Split in train + test set
idxs <- sample(1:nrow(iris),as.integer(0.7*nrow(iris)))
trainIris <- iris[idxs,]
testIris <- iris[-idxs,]

## A 3-nearest neighbours model with no standardization
nn3 <- kNN(Species ~ .,trainIris,testIris,stand=FALSE,k=3)

## The resulting confusion matrix
table(testIris[,'Species'],nn3)

## Now a 5-nearest neighbours model with standardization
nn5 <- kNN(Species ~ .,trainIris,testIris,stand=TRUE,k=5)

## The resulting confusion matrix
table(testIris[,'Species'],nn5)