# comp.knn: The k-NN algorithm for compositional data In Compositional: Compositional Data Analysis

## Description

The k-NN algorithm for compositional data with and without using the power transformation.

## Usage

 ```1 2 3``` ```comp.knn(xnew, x, ina, a = 1, k = 5, type = "S", apostasi = "ESOV", mesos = TRUE) alfa.knn(xnew, x, ina, a = 1, k = 5, type = "S", mesos = TRUE, apostasi = "euclidean") ```

## Arguments

 `xnew` A matrix with the new compositional data whose group is to be predicted. Zeros are allowed, but you must be carefull to choose strictly positive values of α or not to set apostasi= "Ait". `x` A matrix with the available compositional data. Zeros are allowed, but you must be carefull to choose strictly positive vcalues of α or not to set apostasi= "Ait". `ina` A group indicator variable for the available data. `a` The value of α. As zero values in the compositional data are allowed, you must be carefull to choose strictly positive vcalues of α. You have the option to put a = NULL. In this case, the xnew and x are assumed to be the already α-transformed data. `k` The number of nearest neighbours to consider. `type` This can be either "S" for the standard k-NN or "NS" for the non standard (see details). `apostasi` The type of distance to use. For the compk.knn this can be one of the following: "ESOV", "taxicab", "Ait", "Hellinger", "angular" or "CS". See the references for them. For the alfa.knn this can be either "euclidean" or "manhattan". `mesos` This is used in the non standard algorithm. If TRUE, the arithmetic mean of the distances is calulated, otherwise the harmonic mean is used (see details).

## Details

The k-NN algorithm is applied for the compositional data. There are many metrics and possibilities to choose from. The standard algorithm finds the k nearest observations to a new observation and allocates it to the class which appears most times in the neighbours. The non standard algorithm is slower but perhaps more accurate. For every group is finds the k nearest neighbours to the new observation. It then computes the arithmetic or the harmonic mean of the distances. The new point is allocated to the class with the minimum distance.

## Value

A vector with the estimated groups.

## Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris <[email protected]> and Giorgos Athineou <[email protected]>

## References

Tsagris, Michail (2014). The k-NN algorithm for compositional data: a revised approach with and without zero values present. Journal of Data Science, 12(3): 519-534.

Friedman Jerome, Trevor Hastie and Robert Tibshirani (2009). The elements of statistical learning, 2nd edition. Springer, Berlin

Tsagris Michail, Simon Preston and Andrew T.A. Wood (2016). Improved classification for compositional data using the α-transformation. Journal of classification 33(2): 243-261.

Connie Stewart (2016). An approach to measure distance between compositional diet estimates containing essential zeros. Journal of Applied Statistics 44.7 (2017): 1137-1152.

```compknn.tune, rda, alfa ```
 ```1 2 3 4 5 6 7``` ```x <- as.matrix( iris[, 1:4] ) x <- x/ rowSums(x) ina <- iris[, 5] mod <- comp.knn(x, x, ina, a = 1, k = 5) table(ina, mod) mod2 <- alfa.knn(x, x, ina, a = 1, k = 5) table(ina, mod2) ```