knnr | R Documentation |
Functions knnr
and knnda
build KNN (eventually locally weighted) regression and discrimination models, respectively, for an univariate response y
.
The functions use functions getknn
and locw
. See the code for details
For each new observation to predict, the principle of KNN regression models (R and DA) is to select a number of k
nearest neighbors and to calculate the prediction by the average of the response y
(for regression) or the most frequent class in y
(for discrimination) over this neighborhood. The KNN selection step is referred to as weighting "1"
in locw
. In standard KNN regression models, the statistical weight of each of the k
neighbors is 1/k
. In locally weighted KNN regression models, the statistical weights of the neighbors depend from the dissimilarities (preliminary calculated) between the observation to predict and the k
neighbors. This step is referred to as weighting "2"
in locw
.
In knnr
and knnda
, the dissimilarities can be calculated from the original (i.e. not compressed) data or from preliminary computed global PLS scores.
knnr(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = Inf, k,
stor = TRUE,
print = TRUE,
...
)
knnda(
Xr, Yr,
Xu, Yu = NULL,
ncompdis = NULL, diss = c("euclidean", "mahalanobis", "correlation"),
h = Inf, k,
stor = TRUE,
print = TRUE,
...
)
Xr |
A |
Yr |
A vector of length |
Xu |
A |
Yu |
A vector of length |
diss |
The type of dissimilarity used for defining the neighbors. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)). |
ncompdis |
A vector (eventually of length = 1) defining the number(s) of components of the preliminary global PLS calculated on |
h |
A vector (eventually of length = 1) defining the scaling shape factor(s) of the function of the weights applied to the neighbors in the weighted PLSR. Lower is |
k |
A vector (eventually of length = 1) defining the number(s) of nearest neighbors to select in the reference data set for each observation to predict. Each component of |
stor |
Logical (default to |
print |
Logical (default = |
... |
Optionnal arguments to pass in function |
A list of outputs (see examples), such as:
y |
Responses for the test data. |
fit |
Predictions for the test data. |
r |
Residuals for the test data. |
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
data(datcass)
data(datforages)
######################## knnr
Xr <- datcass$Xr
yr <- datcass$yr
Xu <- datcass$Xu
yu <- datcass$yu
Xr <- detrend(Xr)
Xu <- detrend(Xu)
headm(Xr)
headm(Xu)
## A KNN-WR model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting "1" = knn selection of k = {5, 10, 15} neighbors
## - Weighting "2" = within each neighborhood, weights are calculated by "wdist"
ncompdis <- 10
h <- c(1, 2)
k <- seq(5, 20, by = 5)
fm <- knnr(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
h = h, k = k,
print = TRUE
)
names(fm)
head(fm$y)
head(fm$fit)
head(fm$r)
z <- mse(fm, ~ ncompdis + h + k)
z
z[z$rmsep == min(z$rmsep), ]
group <- paste("ncompdis=", z$ncompdis, ", h=", z$h, sep = "")
plotxy(z[, c("k", "rmsep")], asp = 0, group = group, pch = 16)
## Same but where :
## The dissimilarities between the observations are defined
## by Euclidean distances calculated from the original (i.e. not compressed) X data
ncompdis <- NULL
h <- c(1, 2)
k <- seq(5, 20, by = 5)
fm <- knnr(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "euclidean",
h = h, k = k,
print = TRUE
)
z <- mse(fm, ~ ncompdis + h + k)
z
z[z$rmsep == min(z$rmsep), ]
group <- paste("ncompdis=", z$ncompdis, ", h=", z$h, sep = "")
plotxy(z[, c("k", "rmsep")], asp = 0, group = group, pch = 16)
######################## knnda
Xr <- datforages$Xr
yr <- datforages$yr
Xu <- datforages$Xu
yu <- datforages$yu
Xr <- savgol(snv(Xr), n = 21, p = 2, m = 2)
Xu <- savgol(snv(Xu), n = 21, p = 2, m = 2)
headm(Xr)
headm(Xu)
table(yr)
table(yu)
## A knnDA model where:
## The dissimilarities between the observations are defined
## by the Mahalanobis distances calculated from a global PLS score space
## of ncompdis = 10 components.
## - Weighting "1" = knn selection of k = {5, 10, 15} neighbors
## - Weighting "2" = within each neighborhood, weights are calculated by "wdist"
ncompdis <- 10
h <- c(1, 2)
k <- seq(5, 15, by = 5)
fm <- knnda(
Xr, yr,
Xu, yu,
ncompdis = ncompdis, diss = "mahalanobis",
h = h, k = k,
print = TRUE
)
names(fm)
headm(fm$y)
headm(fm$fit)
headm(fm$r)
z <- err(fm, ~ ncompdis + h + k)
z
z[z$err == min(z$errp), ]
group <- paste("ncompdis=", z$ncompdis, ", h=", z$h, sep = "")
plotxy(z[, c("k", "errp")], asp = 0, group = group, pch = 16)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.