knnr: KNN Regression and Discrimination

View source: R/knnr.R

knnrR Documentation

KNN Regression and Discrimination

Description

Functions knnr and knnda build KNN (eventually locally weighted) regression and discrimination models, respectively, for an univariate response y.

The functions use functions getknn and locw. See the code for details

For each new observation to predict, the principle of KNN regression models (R and DA) is to select a number of k nearest neighbors and to calculate the prediction by the average of the response y (for regression) or the most frequent class in y (for discrimination) over this neighborhood. The KNN selection step is referred to as weighting "1" in locw. In standard KNN regression models, the statistical weight of each of the k neighbors is 1/k. In locally weighted KNN regression models, the statistical weights of the neighbors depend from the dissimilarities (preliminary calculated) between the observation to predict and the k neighbors. This step is referred to as weighting "2" in locw.

In knnr and knnda, the dissimilarities can be calculated from the original (i.e. not compressed) data or from preliminary computed global PLS scores.

Usage


knnr(
  Xr, Yr,
  Xu, Yu = NULL,
  ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
  h = Inf, k,
  stor = TRUE,
  print = TRUE,
  ...
  ) 

knnda(
  Xr, Yr,
  Xu, Yu = NULL,
  ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
  h = Inf, k,
  stor = TRUE,
  print = TRUE,
  ...
  )

Arguments

Xr

A n x p matrix or data frame of reference (= training) observations.

Yr

A vector of length n, or a n x 1 matrix, of reference (= training) responses(quantitative variable or class membership).

Xu

A m x p matrix or data frame of new (= test) observations to predict.

Yu

A vector of length m, or a m x 1 matrix, of the true response (quantitative variable or class membership). Default to NULL.

diss

The type of dissimilarity used for defining the neighbors. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)).

ncompdis

A vector (eventually of length = 1) defining the number(s) of components of the preliminary global PLS calculated on (Xr, Yr) and Xu for calculating the dissimilarities used for defining the neighbors. If NULL (default; no preliminary data compression), the dissimilarities are calculated from the original data Xr and Xu.Each component of ncompdis is considered successively in the calculations.

h

A vector (eventually of length = 1) defining the scaling shape factor(s) of the function of the weights applied to the neighbors in the weighted PLSR. Lower is h, sharper is the function. See wdist. Each component of h is considered successively in the calculations.

k

A vector (eventually of length = 1) defining the number(s) of nearest neighbors to select in the reference data set for each observation to predict. Each component of k is considered successively in the calculations.

stor

Logical (default to TRUE). See locw.

print

Logical (default = TRUE). If TRUE, fitting information are printed.

...

Optionnal arguments to pass in function wdist.

Value

A list of outputs (see examples), such as:

y

Responses for the test data.

fit

Predictions for the test data.

r

Residuals for the test data.

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples


data(datcass)
data(datforages)

######################## knnr

Xr <- datcass$Xr
yr <- datcass$yr

Xu <- datcass$Xu
yu <- datcass$yu

Xr <- detrend(Xr)
Xu <- detrend(Xu)

headm(Xr)
headm(Xu)

## A KNN-WR model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting "1" = knn selection of k = {5, 10, 15} neighbors
## - Weighting "2" = within each neighborhood, weights are calculated by "wdist" 

ncompdis <- 10
h <- c(1, 2)
k <- seq(5, 20, by = 5)
fm <- knnr(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  h = h, k = k,
  print = TRUE
  )
names(fm)
head(fm$y)
head(fm$fit)
head(fm$r)

z <- mse(fm, ~ ncompdis + h + k)
z
z[z$rmsep == min(z$rmsep), ]

group <- paste("ncompdis=", z$ncompdis, ", h=", z$h, sep = "")
plotxy(z[, c("k", "rmsep")], asp = 0, group = group, pch = 16)

## Same but where :
## The dissimilarities between the observations are defined
## by Euclidean distances calculated from the original (i.e. not compressed) X data

ncompdis <- NULL
h <- c(1, 2)
k <- seq(5, 20, by = 5)
fm <- knnr(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "euclidean",
  h = h, k = k,
  print = TRUE
  )

z <- mse(fm, ~ ncompdis + h + k)
z
z[z$rmsep == min(z$rmsep), ]

group <- paste("ncompdis=", z$ncompdis, ", h=", z$h, sep = "")
plotxy(z[, c("k", "rmsep")], asp = 0, group = group, pch = 16)

######################## knnda

Xr <- datforages$Xr
yr <- datforages$yr

Xu <- datforages$Xu
yu <- datforages$yu

Xr <- savgol(snv(Xr), n = 21, p = 2, m = 2)
Xu <- savgol(snv(Xu), n = 21, p = 2, m = 2)

headm(Xr)
headm(Xu)

table(yr)
table(yu)

## A knnDA model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting "1" = knn selection of k = {5, 10, 15} neighbors
## - Weighting "2" = within each neighborhood, weights are calculated by "wdist" 

ncompdis <- 10
h <- c(1, 2)
k <- seq(5, 15, by = 5)
fm <- knnda(
  Xr, yr,
  Xu, yu,
  ncompdis = ncompdis, diss = "mahalanobis",
  h = h, k = k,
  print = TRUE
  )
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)

z <- err(fm, ~ ncompdis + h + k)
z
z[z$err == min(z$errp), ]

group <- paste("ncompdis=", z$ncompdis, ", h=", z$h, sep = "")
plotxy(z[, c("k", "errp")], asp = 0, group = group, pch = 16)


mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.