knn.boot: Computing Exact or Epsilon- Bootstrap Risk Estimator for the...

Description Usage Arguments Details Value warning Author(s) See Also Examples

Description

knn.boot is used to compute the exact or Epsilon- bootstrap estimator of the risk for the kNN algorithm in binary classification. Neighbors are obtained using the canonical Euclidian distance. Predicted labels are obtained by majority vote. The risk is computed using the 0/1 hard loss function, and when ties occur a value of 0.5 is returned.

Usage

1
knn.boot(data = numeric(), label, k = numeric(), epsilon = 0)

Arguments

data

an input data.frame or matrix Where each line corresponds to an observation.

label

a vector containing labels. label may be a factor, numeric or character variable with at most 2 different values.

k

the number of neighbors to be considered.

epsilon

required precision level for the bootstrap approximation. If epsilon=0 (default) the exact bootstrap is computed.

Details

knn.boot computes the "exact" bootstrap estimator, meaning that all resamplings with replacement of the initial dataset are considered. The epsilon-bootstrap provides an approximated value of the previous "exact" estimator by discarding resamplings with associated low probabilities. This last strategy is recommended for large sets of data for which exact resampling may be time consumming.

Value

knn.cv returns a list containing the following two components:

risk

value of the risk evaluated by exact or approximated bootstrap

error.ind

vector containing the individual risk for each observation

warning

Exact bootstrap requires intensive computational time and should be applied to small datasets only (n<200)

Author(s)

The function has been implemented by Kai Li, based on Celisse and Mary-Huard (2011).

See Also

knn.emp for empirical estimation of the risk and knn.cv for an exact cross-validated estimation, and knn.search to obtain the k nearest neighbors.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(Spam)
spam.label <- Spam[1:20,58]
spam.data <- Spam[1:20,-58]

# Using the spam dataset
names(spam.data)
table(spam.label)

# Exact bootstrap
knn.boot(data = spam.data, label = spam.label, k = 7)

# epsilon-bootstrap
knn.boot(data = spam.data, label = spam.label, k = 7, epsilon = 0.01)

ExactSampling documentation built on May 2, 2019, 6:08 p.m.