bigknn.cv: Cross-validation for the k-NN algorithm for really lage scale...
In Rfast2: A Collection of Efficient and Extremely Fast R Functions II

Cross-validation for the k-NN algorithm for really lage scale data

R Documentation

Cross-validation for the k-NN algorithm for really lage scale data

Description

Cross-validation for the k-NN algorithm for really lage scale data.

Usage

bigknn.cv(y, x, k = 5:10, type = "C", folds = NULL, nfolds = 10,
stratified = TRUE, seed = FALSE, pred.ret = FALSE)

Arguments

`y`	A vector of data. The response variable, which can be either continuous or categorical (factor is acceptable).
`x`	A matrix with the available data, the predictor variables.
`k`	A vector with the possible numbers of nearest neighbours to be considered.
`type`	If your response variable y is numerical data, then this should be "R" (regression). If y is in general categorical set this argument to "C" (classification).
`folds`	A list with the indices of the folds.
`nfolds`	The number of folds to be used. This is taken into consideration only if "folds" is NULL.
`stratified`	Do you want the folds to be selected using stratified random sampling? This preserves the analogy of the samples of each group. Make this TRUE if you wish, but only for the classification. If you have regression (type = "R"), do not put this to TRUE as it will cause problems or return wrong results.
`seed`	If you set this to TRUE, the same folds will be created every time.
`pred.ret`	If you want the predicted values returned set this to TRUE.

Details

The concept behind k-NN is simple. Suppose we have a matrix with predictor variables and a vector with the response variable (numerical or categorical). When a new vector with observations (predictor variables) is available, its corresponding response value, numerical or categorical, is to be predicted. Instead of using a model, parametric or not, one can use this ad hoc algorithm.

The k smallest distances between the new predictor variables and the existing ones are calculated. In the case of regression, the average, median, or harmonic mean of the corresponding response values of these closest predictor values are calculated. In the case of classification, i.e. categorical response value, a voting rule is applied. The most frequent group (response value) is where the new observation is to be allocated.

This function does the cross-validation procedure to select the optimal k, the optimal number of nearest neighbours. The optimal in terms of some accuracy metric. For the classification it is the percentage of correct classification and for the regression the mean squared error.

This function allows for the Euclidean distance only.

Value

A list including:

preds

If pred.ret is TRUE the predicted values for each fold are returned as elements in a list.

crit

A vector whose length is equal to the number of k and is the accuracy metric for each k. For the classification case it is the percentage of correct classification. For the regression case the mean square of prediction error. If you want to compute other metrics of accuracy we suggest you choose "pred.ret = TRUE" when running the function and then write a simple function to compute more metrics. See colmses.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

References

Friedman J., Hastie T. and Tibshirani R. (2017). The elements of statistical learning. New York: Springer.

Cover TM and Hart PE (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 13(1):21-27.

Examples

x <- as.matrix(iris[, 1:4])
mod <- bigknn.cv(y = iris[, 5], x = x, k = c(3, 4), nfolds = 5 )

Rfast2 documentation built on March 7, 2026, 9:06 a.m.

Rfast2 index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Rfast2
A Collection of Efficient and Extremely Fast R Functions II

bigknn.cv: Cross-validation for the k-NN algorithm for really lage scale...
In Rfast2: A Collection of Efficient and Extremely Fast R Functions II

Cross-validation for the k-NN algorithm for really lage scale data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to bigknn.cv in Rfast2...

R Package Documentation

Browse R Packages

We want your feedback!

Rfast2 A Collection of Efficient and Extremely Fast R Functions II

bigknn.cv: Cross-validation for the k-NN algorithm for really lage scale... In Rfast2: A Collection of Efficient and Extremely Fast R Functions II

Cross-validation for the k-NN algorithm for really lage scale data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to bigknn.cv in Rfast2...

R Package Documentation

Browse R Packages

We want your feedback!

Rfast2
A Collection of Efficient and Extremely Fast R Functions II

bigknn.cv: Cross-validation for the k-NN algorithm for really lage scale...
In Rfast2: A Collection of Efficient and Extremely Fast R Functions II