cv_knn: Cross-validation for K nearest-neighbor regression

View source: R/cross-validation-KNN.R

cv_knnR Documentation

Cross-validation for K nearest-neighbor regression


This function calculates the estimated cross-validation prediction error for K nearest-neighbor regression and returns a suitable choice for K.


cv_knn(x_mat, dise_vec, veri_stat, k_list = NULL, type = "eucli", plot = FALSE)



a numeric design matrix, which used in rho_knn to estimate probabilities of the disease status.


a n * 3 binary matrix with three columns, corresponding to the three classes of the disease status. In row i, 1 in column j indicates that the i-th subject belongs to class j, with j = 1, 2, 3. A row of NA values indicates a non-verified subject.


a binary vector containing the verification status (1 verified, 0 not verified).


a list of candidate values for K. If NULL(the default), the set \{1, 2, ..., n.ver\} is employed, where, n.ver is the number of verified subjects.


a type of distance, see rho_knn for more details. Default "eucli".


if TRUE, a plot of cross-validation prediction error is produced.


Data are divided into two groups, the first contains the data corresponding to veri_stat = 1, whereas the second contains the data corresponding to veri_stat = 0. In the first group, the discrepancy between the true disease status and the KNN estimates of the probabilities of the disease status is computed by varying k from 1 to the number of verification subjects, see To Duc et al. (2020). The optimal value of k is the value that corresponds to the smallest value of the discrepancy.


A suitable choice for k is returned.


To Duc, K., Chiogna, M. and Adimari, G. (2020) Nonparametric estimation of ROC surfaces in presence of verification bias. REVSTAT-Statistical Journal. 18, 5, 697–720.


x_mat <- cbind(EOC$CA125, EOC$CA153, EOC$Age)
dise_na <- pre_data(EOC$D, EOC$CA125)
dise_vec_na <- dise_na$dise_vec
cv_knn(x_mat, dise_vec_na, EOC$V, type = "mahala", plot = TRUE)

bcROCsurface documentation built on Sept. 9, 2023, 9:07 a.m.