knn.cv: Computing Cross-validated Risk for the kNN Algorithm

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

knn.cv is used to compute the Leave-p-Out (LpO) cross-validation estimator of the risk for the kNN algorithm. Neighbors are obtained using the canonical Euclidian distance. In the classification case predicted labels are obtained by majority vote. The risk is computed using the 0/1 hard loss function, and when ties occur a value of 0.5 is returned. In the regression case predicted labels are obtained by averaging. The risk is computed using the quadratic loss function.

Usage

1
knn.cv(data, label, k, p = 1, method)

Arguments

data

an input data.frame or matrix Where each line corresponds to an observation.

label

a vector containing labels. If method='regression' then label must be numeric. If method='classification' then label may be a factor, numeric or character variable with only 2 different values.

k

the number of neighbors to be considered.

p

leave-p-out parameter. Each resampling splits the sample into a training sample of size n-p and a validation sample of size p.

method

"classification" or "regression"

Value

knn.cv returns a list containing the following two components:

risk

value of the risk evaluated by L-pO cross-validation

error.ind

vector containing the individual risk for each observation.

Note

For a given Value of parameter k, only values of parameter p satisfying k+p ≤q n are admissible.

Author(s)

The function has been implemented by Kai Li, based on Celisse and Mary-Huard (2011).

References

Celisse, A.and Mary-Huard, T. (2011) Exact Cross-Validation for kNN and applications to passive and active learning in classification. Journal de la SFdS, 152, 3.

See Also

knn.emp for empirical estimation of the risk and knn.boot for an exact bootstrap estimation, and knn.search to obtain the k nearest neighbors.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(Spam)
spam.label <- Spam[,58]
spam.data <- Spam[,-58]

# Using the spam dataset
names(spam.data)
table(spam.label)

# LpO cross-validation
res.knn.cv <- knn.cv(data = spam.data, label = spam.label, k = 7, p = 12, method = "classification")
res.knn.cv$risk
head(res.knn.cv$error.ind)

ExactSampling documentation built on May 2, 2019, 6:08 p.m.