# pointdensity: Calculate Local Density at Each Data Point In dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

## Description

Calculate the local density at each data point as either the number of points in the eps-neighborhood (as used in `dbscan()`) or perform kernel density estimation (KDE) using a uniform kernel. The function uses a kd-tree for fast fixed-radius nearest neighbor search.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```pointdensity( x, eps, type = "frequency", search = "kdtree", bucketSize = 10, splitRule = "suggest", approx = 0 ) ```

## Arguments

 `x` a data matrix. `eps` radius of the eps-neighborhood, i.e., bandwidth of the uniform kernel). `type` `"frequency"` or `"density"`. should the raw count of points inside the eps-neighborhood or the kde be returned. `search, bucketSize, splitRule, approx` algorithmic parameters for `frNN()`.

## Details

`dbscan()` estimates the density around a point as the number of points in the eps-neighborhood of the point (including the query point itself). Kernel density estimation (KDE) using a uniform kernel, which is just this point count in the eps-neighborhood divided by (2 eps n), where n is the number of points in `x`.

Points with low local density often indicate noise (see e.g., Wishart (1969) and Hartigan (1975)).

## Value

A vector of the same length as data points (rows) in `x` with the count or density values for each data point.

Michael Hahsler

## References

Wishart, D. (1969), Mode Analysis: A Generalization of Nearest Neighbor which Reduces Chaining Effects, in Numerical Taxonomy, Ed., A.J. Cole, Academic Press, 282-311.

John A. Hartigan (1975), Clustering Algorithms, John Wiley & Sons, Inc., New York, NY, USA.

`frNN()`, `stats::density()`.
Other Outlier Detection Functions: `glosh()`, `kNNdist()`, `lof()`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ```set.seed(665544) n <- 100 x <- cbind( x=runif(10, 0, 5) + rnorm(n, sd = 0.4), y=runif(10, 0, 5) + rnorm(n, sd = 0.4) ) plot(x) ### calculate density d <- pointdensity(x, eps = .5, type = "density") ### density distribution summary(d) hist(d, breaks = 10) ### plot with point size is proportional to Density plot(x, pch = 19, main = "Density (eps = .5)", cex = d*5) ### Wishart (1969) single link clustering after removing low-density noise # 1. remove noise with low density f <- pointdensity(x, eps = .5, type = "frequency") x_nonoise <- x[f >= 5,] # 2. use single-linkage on the non-noise points hc <- hclust(dist(x_nonoise), method = "single") plot(x, pch = 19, cex = .5) points(x_nonoise, pch = 19, col= cutree(hc, k = 4) + 1L) ```