DNN | R Documentation |
DNN
uses pre-cooked distance matrix to replace missing values in class labels.
DNN(dst, cl, k, d, details=FALSE, self=FALSE) Dnn(trn, tst, classes, FUN=function(.x) dist(.x), ...)
dst |
Distance matrix (object of class 'dist'). |
cl |
Factor of class labels, should contain NAs to designate testing sub-group. |
k |
How many neighbors to select, odd numbers preferable. If specified, do not use "d". |
d |
Distance to consider for neighborhood, in fractions of maximal distance. If specified, do not use "k". |
details |
If TRUE, function will return voting matrix. Default is FALSE. |
self |
Allow self-training? Default is FALSE. |
trn |
Data to train from, classes variable out. |
tst |
Data with unknown classes. |
classes |
Classes variable for training data. |
FUN |
Function to calculate distances, by default, just dist() (i.e., Euclidean distances). |
... |
Additional arguments from Dnn() to DNN(), note that either 'k' or 'd' must be specified. |
If classic kNN is a lazy classifier, DNN is super-lazy because it does not even calculate the distance matrix itself. Instead, you supply it with distance matrix (object of class 'dist') pre-computed with _any_ possible tool. This lifts many restrictions. For example, arbitrary distance could be used (like Gower distance which allows any type of variable). This is also much faster than typical kNN.
In addition to neighbor-based kNN classification, DNN implements _neighborhood_ classification when all neighbors within selected distance used for voting.
As usual in kNN, ties are broken at random. DNN also controls situations when no neighbors are within the given distance (and returns NA), and also when all neighbors are relevant (also returns NA).
By default, DNN() returns missing part of class labels, completely or partially filled with new (predicted) class labels. If 'cl' has no NAs and self=FALSE (default), DNN() returns it back with warning. It allows for combined and stepwise extensions (see examples). If 'details=TRUE', DNN() will return matrix where each column represents the table used for voting. If self=TRUE, DNN() could be used to calculate the class proximity surrogate.
Dnn() is based on DNN() but has more class::knn()-like interface (see examples).
Character vector with predicted class labels; or matrix if 'details=TRUE'.
Alexey Shipunov
class::knn
iris.d <- dist(iris[, -5]) cl1 <- iris$Species sam <- c(rep(0, 4), 1) > 0 cl1[!sam] <- NA table(cl1, useNA="ifany") ## based on neighbor number iris.pred <- DNN(dst=iris.d, cl=cl1, k=5) Misclass(iris$Species[is.na(cl1)], iris.pred) ## based on neighborhood size iris.pred <- DNN(dst=iris.d, cl=cl1, d=0.05) table(iris.pred, useNA="ifany") Misclass(iris$Species[is.na(cl1)], iris.pred) ## protection against "all points relevant" DNN(dst=iris.d, cl=cl1, d=1)[1:5] ## and all are ties: DNN(dst=iris.d, cl=cl1, d=1, details=TRUE)[, 1:5] ## any distance works iris.d2 <- Gower.dist(iris[, -5]) iris.pred <- DNN(dst=iris.d2, cl=cl1, k=5) Misclass(iris$Species[is.na(cl1)], iris.pred) ## combined cl2 <- cl1 iris.pred <- DNN(dst=iris.d, cl=cl2, d=0.05) cl2[is.na(cl2)] <- iris.pred table(cl2, useNA="ifany") iris.pred2 <- DNN(dst=iris.d, cl=cl2, k=5) cl2[is.na(cl2)] <- iris.pred2 table(cl2, useNA="ifany") Misclass(iris$Species, cl2) ## self-training and class proximity surrogate cl3 <- iris$Species t(DNN(dst=iris.d, cl=cl3, k=5, details=TRUE, self=TRUE))/5 ## Dnn() with more class::knn()-like interface iris.trn <- iris[sam, ] iris.tst <- iris[!sam, ] Dnn(iris.trn[, -5], iris.tst[, -5], iris.trn[, 5], k=7) ## stepwise DNN, note the warning when no NAs left cl4 <- cl1 for (d in (5:14)/100) { iris.pred <- DNN(dst=iris.d, cl=cl4, d=d) cl4[is.na(cl4)] <- iris.pred } table(cl4, useNA="ifany") Misclass(iris$Species, cl4) ## rushing to d=14% gives much worse results iris.pred <- DNN(dst=iris.d, cl=cl1, d=0.14) table(iris.pred, useNA="ifany") Misclass(iris$Species[is.na(cl1)], iris.pred)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.