knn.double.cv: Cross-Validation with k-Nearest Neighbors algorithm.

knn.double.cvR Documentation

Cross-Validation with k-Nearest Neighbors algorithm.

Description

This function performs a 10-fold cross validation on a given data set using k-Nearest Neighbors (kNN) model. To assess the prediction ability of the model, a 10-fold cross-validation is conducted by generating splits with a ratio 1:9 of the data set, that is by removing 10% of samples prior to any step of the statistical analysis, including PLS component selection and scaling. Best number of component for PLS was carried out by means of 10-fold cross-validation on the remaining 90% selecting the best Q2y value. Permutation testing was undertaken to estimate the classification/regression performance of predictors.

Usage

knn.double.cv(Xdata,
              Ydata,
              constrain=1:nrow(Xdata),
              compmax=min(5,c(ncol(Xdata),nrow(Xdata))),
              perm.test=FALSE,
              optim=TRUE,
              scaling = c("centering","autoscaling"),
              times=100,
              runn=10)

Arguments

Xdata

a matrix.

Ydata

the responses. If Ydata is a numeric vector, a regression analysis will be performed. If Ydata is factor, a classification analysis will be performed.

constrain

a vector of nrow(data) elements. Sample with the same identifying constrain will be split in the training set or in the test set of cross-validation together.

compmax

the number of k to be used for classification.

perm.test

a classification vector.

optim

if perform the optmization of the number of k.

scaling

the scaling method to be used. Choices are "centering" or "autoscaling" (by default = "centering"). A partial string sufficient to uniquely identify the choice is permitted.

times

number of cross-validations with permutated samples

runn

number of cross-validations loops.

Value

A list with the following components:

Ypred

the vector containing the predicted values of the response variables obtained by cross-validation.

Yfit

the vector containing the fitted values of the response variables.

Q2Y

Q2y value.

R2Y

R2y value.

conf

The confusion matrix (only in classification mode).

acc

The cross-validated accuracy (only in classification mode).

txtQ2Y

a summary of the Q2y values.

txtR2Y

a summary of the R2y values.

Author(s)

Stefano Cacciatore

References

Cacciatore S, Luchinat C, Tenori L
Knowledge discovery by accuracy maximization.
Proc Natl Acad Sci U S A 2014;111(14):5117-22. doi: 10.1073/pnas.1220873111. Link

Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA
KODAMA: an updated R package for knowledge discovery and data mining.
Bioinformatics 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. Link

Examples


 data(iris)
 data=iris[,-5]
 labels=iris[,5]
 pp=knn.double.cv(data,labels)
 print(pp$Q2Y)
 table(pp$Ypred,labels)
 

 data(MetRef)
 u=MetRef$data;
 u=u[,-which(colSums(u)==0)]
 u=normalization(u)$newXtrain
 u=scaling(u)$newXtrain
 pp=knn.double.cv(u,as.factor(MetRef$donor))
 print(pp$Q2Y)
 table(pp$Ypred,MetRef$donor)



KODAMA documentation built on Jan. 12, 2023, 5:08 p.m.