This function performs a 10-fold cross validation on a given data set using *k*-Nearest Neighbors (*k*NN) model. To assess the prediction ability of the model, a 10-fold cross-validation is conducted by generating splits with a ratio 1:9 of the data set, that is by removing 10% of samples prior to any step of the statistical analysis, including PLS component selection and scaling. Best number of component for PLS was carried out by means of 10-fold cross-validation on the remaining 90% selecting the best Q2y value. Permutation testing was undertaken to estimate the classification/regression performance of predictors.

knn.double.cv(Xdata, Ydata, constrain=1:nrow(Xdata), compmax=min(5,c(ncol(Xdata),nrow(Xdata))), perm.test=FALSE, optim=TRUE, scaling = c("centering","autoscaling"), times=100, runn=10)

`Xdata` |
a matrix. |

`Ydata` |
the responses. If Ydata is a numeric vector, a regression analysis will be performed. If Ydata is factor, a classification analysis will be performed. |

`constrain` |
a vector of |

`compmax` |
the number of k to be used for classification. |

`perm.test` |
a classification vector. |

`optim` |
if perform the optmization of the number of k. |

`scaling` |
the scaling method to be used. Choices are " |

`times` |
number of cross-validations with permutated samples |

`runn` |
number of cross-validations loops. |

A list with the following components:

`Ypred` |
the vector containing the predicted values of the response variables obtained by cross-validation. |

`Yfit` |
the vector containing the fitted values of the response variables. |

`Q2Y` |
Q2y value. |

`R2Y` |
R2y value. |

`conf` |
The confusion matrix (only in classification mode). |

`acc` |
The cross-validated accuracy (only in classification mode). |

`txtQ2Y` |
a summary of the Q2y values. |

`txtR2Y` |
a summary of the R2y values. |

Stefano Cacciatore

Cacciatore S, Luchinat C, Tenori L

Knowledge discovery by accuracy maximization.

*Proc Natl Acad Sci U S A* 2014;111(14):5117-22. doi: 10.1073/pnas.1220873111. Link

Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA

KODAMA: an updated R package for knowledge discovery and data mining.

*Bioinformatics* 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. Link

data(iris) data=iris[,-5] labels=iris[,5] pp=knn.double.cv(data,labels) print(pp$Q2Y) table(pp$Ypred,labels) data(MetRef) u=MetRef$data; u=u[,-which(colSums(u)==0)] u=normalization(u)$newXtrain u=scaling(u)$newXtrain pp=knn.double.cv(u,as.factor(MetRef$donor)) print(pp$Q2Y) table(pp$Ypred,MetRef$donor)

