# kNNdist: Calculate and Plot k-Nearest Neighbor Distances In dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

 kNNdist R Documentation

## Calculate and Plot k-Nearest Neighbor Distances

### Description

Fast calculation of the k-nearest neighbor distances for a dataset represented as a matrix of points. The kNN distance is defined as the distance from a point to its k nearest neighbor. The kNN distance plot displays the kNN distance of all points sorted from smallest to largest. The plot can be used to help find suitable parameter values for `dbscan()`.

### Usage

```kNNdist(x, k, all = FALSE, ...)

kNNdistplot(x, k, minPts, ...)
```

### Arguments

 `x` the data set as a matrix of points (Euclidean distance is used) or a precalculated dist object. `k` number of nearest neighbors used for the distance calculation. `all` should a matrix with the distances to all k nearest neighbors be returned? `...` further arguments (e.g., kd-tree related parameters) are passed on to `kNN()`. `minPts` to use a k-NN plot to determine a suitable `eps` value for `dbscan()`, `minPts` used in dbscan can be specified and will set `k = minPts - 1`.

### Value

`kNNdist()` returns a numeric vector with the distance to its k nearest neighbor. If `all = TRUE` then a matrix with k columns containing the distances to all 1st, 2nd, ..., kth nearest neighbors is returned instead.

### Author(s)

Michael Hahsler

Other Outlier Detection Functions: `glosh()`, `lof()`, `pointdensity()`

Other NN functions: `NN`, `comps()`, `frNN()`, `kNN()`, `sNN()`

### Examples

```data(iris)
iris <- as.matrix(iris[, 1:4])

## Find the 4-NN distance for each observation (see ?kNN
## for different search strategies)
kNNdist(iris, k = 4)

## Get a matrix with distances to the 1st, 2nd, ..., 4th NN.
kNNdist(iris, k = 4, all = TRUE)

## Produce a k-NN distance plot to determine a suitable eps for
## DBSCAN with MinPts = 5. Use k = 4 (= MinPts -1).
## The knee is visible around a distance of .7
kNNdistplot(iris, k = 4)

cl <- dbscan(iris, eps = .7, minPts = 5)
pairs(iris, col = cl\$cluster + 1L)
## Note: black points are noise points
```

dbscan documentation built on Oct. 29, 2022, 1:13 a.m.