Description Usage Arguments Details Value Author(s) Examples

Function to calculate sum of distance to k-nearest neighbors as an outlier score, based on a kd-tree

1 | ```
KNN_SUM(dataset, k=5)
``` |

`dataset` |
The dataset for which observations have a summed k-nearest neighbors distance returned |

`k` |
The number of k-nearest neighbors. k has to be smaller than the number of observations in dataset |

KNN_SUM computes the sum of distance to neighboring observations. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package. The KNN_SUM function is useful for outlier detection in clustering and other multidimensional domains.

A vector of summed distance for observations. The greater distance, the greater outlierness

Jacob H. Madsen

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```
# Create dataset and set an optional k
X <- iris[,1:4]
K <- 5
# Find outliers
outlier_score <- KNN_SUM(dataset=X, k=K)
# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)
# Inspect the distribution of outlier scores
hist(outlier_score)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.