# knnr: KNN Regression and Discrimination In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

 knnr R Documentation

## KNN Regression and Discrimination

### Description

Functions `knnr` and `knnda` build KNN (eventually locally weighted) regression and discrimination models, respectively, for an univariate response `y`.

The functions use functions `getknn` and `locw`. See the code for details

For each new observation to predict, the principle of KNN regression models (R and DA) is to select a number of `k` nearest neighbors and to calculate the prediction by the average of the response `y` (for regression) or the most frequent class in `y` (for discrimination) over this neighborhood. The KNN selection step is referred to as `weighting "1"` in `locw`. In standard KNN regression models, the statistical weight of each of the `k` neighbors is `1/k`. In locally weighted KNN regression models, the statistical weights of the neighbors depend from the dissimilarities (preliminary calculated) between the observation to predict and the `k` neighbors. This step is referred to as `weighting "2"` in `locw`.

In `knnr` and `knnda`, the dissimilarities can be calculated from the original (i.e. not compressed) data or from preliminary computed global PLS scores.

### Usage

``````
knnr(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = Inf, k,
stor = TRUE,
print = TRUE,
...
)

knnda(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = Inf, k,
stor = TRUE,
print = TRUE,
...
)

``````

### Arguments

 `Xr` A `n x p` matrix or data frame of reference (= training) observations. `Yr` A vector of length `n`, or a `n x 1` matrix, of reference (= training) responses(quantitative variable or class membership). `Xu` A `m x p` matrix or data frame of new (= test) observations to predict. `Yu` A vector of length `m`, or a `m x 1` matrix, of the true response (quantitative variable or class membership). Default to `NULL`. `diss` The type of dissimilarity used for defining the neighbors. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)). `ncompdis` A vector (eventually of length = 1) defining the number(s) of components of the preliminary global PLS calculated on `(Xr, Yr)` and `Xu` for calculating the dissimilarities used for defining the neighbors. If `NULL` (default; no preliminary data compression), the dissimilarities are calculated from the original data `Xr` and `Xu`.Each component of `ncompdis` is considered successively in the calculations. `h` A vector (eventually of length = 1) defining the scaling shape factor(s) of the function of the weights applied to the neighbors in the weighted PLSR. Lower is `h`, sharper is the function. See `wdist`. Each component of `h` is considered successively in the calculations. `k` A vector (eventually of length = 1) defining the number(s) of nearest neighbors to select in the reference data set for each observation to predict. Each component of `k` is considered successively in the calculations. `stor` Logical (default to `TRUE`). See `locw`. `print` Logical (default = `TRUE`). If `TRUE`, fitting information are printed. `...` Optionnal arguments to pass in function `wdist`.

### Value

A list of outputs (see examples), such as:

 `y` Responses for the test data. `fit` Predictions for the test data. `r` Residuals for the test data.

### References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

### Examples

``````
data(datcass)
data(datforages)

######################## knnr

Xr <- datcass\$Xr
yr <- datcass\$yr

Xu <- datcass\$Xu
yu <- datcass\$yu

Xr <- detrend(Xr)
Xu <- detrend(Xu)

## A KNN-WR model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting "1" = knn selection of k = {5, 10, 15} neighbors
## - Weighting "2" = within each neighborhood, weights are calculated by "wdist"

ncompdis <- 10
h <- c(1, 2)
k <- seq(5, 20, by = 5)
fm <- knnr(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
h = h, k = k,
print = TRUE
)
names(fm)

z <- mse(fm, ~ ncompdis + h + k)
z
z[z\$rmsep == min(z\$rmsep), ]

group <- paste("ncompdis=", z\$ncompdis, ", h=", z\$h, sep = "")
plotxy(z[, c("k", "rmsep")], asp = 0, group = group, pch = 16)

## Same but where :
## The dissimilarities between the observations are defined
## by Euclidean distances calculated from the original (i.e. not compressed) X data

ncompdis <- NULL
h <- c(1, 2)
k <- seq(5, 20, by = 5)
fm <- knnr(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "euclidean",
h = h, k = k,
print = TRUE
)

z <- mse(fm, ~ ncompdis + h + k)
z
z[z\$rmsep == min(z\$rmsep), ]

group <- paste("ncompdis=", z\$ncompdis, ", h=", z\$h, sep = "")
plotxy(z[, c("k", "rmsep")], asp = 0, group = group, pch = 16)

######################## knnda

Xr <- datforages\$Xr
yr <- datforages\$yr

Xu <- datforages\$Xu
yu <- datforages\$yu

Xr <- savgol(snv(Xr), n = 21, p = 2, m = 2)
Xu <- savgol(snv(Xu), n = 21, p = 2, m = 2)

table(yr)
table(yu)

## A knnDA model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting "1" = knn selection of k = {5, 10, 15} neighbors
## - Weighting "2" = within each neighborhood, weights are calculated by "wdist"

ncompdis <- 10
h <- c(1, 2)
k <- seq(5, 15, by = 5)
fm <- knnda(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
h = h, k = k,
print = TRUE
)
names(fm)