knn.cv: Computing Cross-validated Risk for the kNN Algorithm
In ExactSampling: ExactSampling: risk evaluation using exact resampling methods for the k Nearest Neighbor algorithm.

Description Usage Arguments Value Note Author(s) References See Also Examples

knn.cv is used to compute the Leave-p-Out (LpO) cross-validation estimator of the risk for the kNN algorithm. Neighbors are obtained using the canonical Euclidian distance. In the classification case predicted labels are obtained by majority vote. The risk is computed using the 0/1 hard loss function, and when ties occur a value of 0.5 is returned. In the regression case predicted labels are obtained by averaging. The risk is computed using the quadratic loss function.

1	knn.cv(data, label, k, p = 1, method)

`data`	an input data.frame or matrix Where each line corresponds to an observation.
`label`	a vector containing labels. If `method='regression'` then `label` must be numeric. If `method='classification'` then `label` may be a factor, numeric or character variable with only 2 different values.
`k`	the number of neighbors to be considered.
`p`	leave-p-out parameter. Each resampling splits the sample into a training sample of size n-p and a validation sample of size p.
`method`	"classification" or "regression"

knn.cv returns a list containing the following two components:

`risk`	value of the risk evaluated by L-pO cross-validation
`error.ind`	vector containing the individual risk for each observation.

For a given Value of parameter k, only values of parameter p satisfying k+p ≤q n are admissible.

The function has been implemented by Kai Li, based on Celisse and Mary-Huard (2011).

Celisse, A.and Mary-Huard, T. (2011) Exact Cross-Validation for kNN and applications to passive and active learning in classification. Journal de la SFdS, 152, 3.

knn.emp for empirical estimation of the risk and knn.boot for an exact bootstrap estimation, and knn.search to obtain the k nearest neighbors.

data(Spam)
spam.label <- Spam[,58]
spam.data <- Spam[,-58]

# Using the spam dataset
names(spam.data)
table(spam.label)

# LpO cross-validation
res.knn.cv <- knn.cv(data = spam.data, label = spam.label, k = 7, p = 12, method = "classification")
res.knn.cv$risk
head(res.knn.cv$error.ind)