knn.emp: Computing Empirical Risk for the kNN Algorithm

Description Usage Arguments Value Author(s) References See Also Examples

Description

knn.emp is used to compute the empirical estimator of the risk for the kNN and weighted kNN algorithms. Neighbors are obtained using the canonical Euclidian distance. In the classification case predicted labels are obtained by (possibly weighted) majority vote. The risk is computed using the 0/1 hard loss function. When ties occur a value of 0.5 is returned for the risk. In the regression case predicted labels are obtained by (possibly weighted) averaging. The risk is computed using the quadratic loss function.

Usage

1
knn.emp(data, label, k, weight = NULL, alpha = NULL, method)

Arguments

data

an input data.frame or matrix where each line corresponds to an observation.

label

a column vector containing the labels. If method='regression' label must be numeric. If method='classification' label may be a factor, numeric or character variable with only 2 different values.

k

the number of neighbors to be considered.

weight

an optional matrix containing positive weights. For each observation, the corresponding list of weights is given in row, according to the ordering of the sample.

alpha

an optional parameter. If given, the weighted kNN algorithm is performed, where the weight of observation j in the prediction rule to predict the label of point i is 0 if j does not belong to the k nearest neighbors of i, and is inversely proportional to the Euclidian distance between i and j to the power alpha otherwise. Only available when weight=NULL.

method

"classification" or "regression"

Value

knn.emp returns a list containing the following two components:

risk

value of the empirical risk

error.ind

vector containing the individual empirical risk for each observation

Author(s)

The function has been implemented by Kai Li, based on Celisse and Mary-Huard (2011).

References

Celisse, A.and Mary-Huard, T. (2011) Exact Cross-Validation for kNN and applications to passive and active learning in classification. Journal de la SFdS, 152, 3.

See Also

knn.cv for a cross-validated estimation of the risk, knn.boot for an exact bootstrap estimation, and knn.search to obtain the k nearest neighbors.

Examples

1
2
3
4
5
data(Spam)
# Empirical risk, classification case
spam.label <- Spam[,58]
spam.data <- Spam[,-58]
res.knn.emp <- knn.emp(data = spam.data, label = spam.label, k = 7, method = "classification")

ExactSampling documentation built on May 2, 2019, 6:08 p.m.