# kNNdist: Calculate and Plot k-Nearest Neighbor Distances In dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

## Description

Fast calculation of the k-nearest neighbor distances for a dataset represented as a matrix of points. The kNN distance is defined as the distance from a point to its k nearest neighbor. The kNN distance plot displays the kNN distance of all points sorted from smallest to largest. The plot can be used to help find suitable parameter values for `dbscan()`.

## Usage

 ```1 2 3``` ```kNNdist(x, k, all = FALSE, ...) kNNdistplot(x, k, ...) ```

## Arguments

 `x` the data set as a matrix of points (Euclidean distance is used) or a precalculated dist object. `k` number of nearest neighbors used for the distance calculation. `all` should a matrix with the distances to all k nearest neighbors be returned? `...` further arguments (e.g., kd-tree related parameters) are passed on to `kNN()`.

## Value

`kNNdist()` returns a numeric vector with the distance to its k nearest neighbor. If `all = TRUE` then a matrix with k columns containing the distances to all 1st, 2nd, ..., kth nearest neighbors is returned instead.

## Author(s)

Michael Hahsler

Other Outlier Detection Functions: `glosh()`, `lof()`, `pointdensity()`
Other NN functions: `NN`, `comps()`, `frNN()`, `kNN()`, `sNN()`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```data(iris) iris <- as.matrix(iris[, 1:4]) ## Find the 4-NN distance for each observation (see ?kNN ## for different search strategies) kNNdist(iris, k = 4) ## Get a matrix with distances to the 1st, 2nd, ..., 4th NN. kNNdist(iris, k = 4, all = TRUE) ## Produce a k-NN distance plot to determine a suitable eps for ## DBSCAN with MinPts = 5. Use k = 4 (= MinPts -1). ## The knee is visible around a distance of .7 kNNdistplot(iris, k = 4) cl <- dbscan(iris, eps = .7, minPts = 5) pairs(iris, col = cl\$cluster + 1L) ## Note: black points are noise points ```