# densityClust: Calculate clustering attributes based on the densityClust... In thomasp85/densityClust: Clustering by Fast Search and Find of Density Peaks

## Description

This function takes a distance matrix and optionally a distance cutoff and calculates the values necessary for clustering based on the algorithm proposed by Alex Rodrigues and Alessandro Laio (see references). The actual assignment to clusters are done in a later step, based on user defined threshold values. If a distance matrix is passed into `distance` the original algorithm described in the paper is used. If a matrix or data.frame is passed instead it is interpretted as point coordinates and rho will be estimated based on k-nearest neighbors of each point (rho is estimated as `exp(-mean(x))` where `x` is the distance to the nearest neighbors). This can be useful when data is so large that calculating the full distance matrix can be prohibitive.

## Usage

 `1` ```densityClust(distance, dc, gaussian = FALSE, verbose = FALSE, ...) ```

## Arguments

 `distance` A distance matrix or a matrix (or data.frame) for the coordinates of the data. If a matrix or data.frame is used the distances and local density will be estimated using a fast k-nearest neighbor approach. `dc` A distance cutoff for calculating the local density. If missing it will be estimated with `estimateDc(distance)` `gaussian` Logical. Should a gaussian kernel be used to estimate the density (defaults to FALSE) `verbose` Logical. Should the running details be reported `...` Additional parameters passed on to get.knn

## Details

The function calculates rho and delta for the observations in the provided distance matrix. If a distance cutoff is not provided this is first estimated using `estimateDc()` with default values.

The information kept in the densityCluster object is:

`rho`

A vector of local density values

`delta`

A vector of minimum distances to observations of higher density

`distance`

The initial distance matrix

`dc`

The distance cutoff used to calculate rho

`threshold`

A named vector specifying the threshold values for rho and delta used for cluster detection

`peaks`

A vector of indexes specifying the cluster center for each cluster

`clusters`

A vector of cluster affiliations for each observation. The clusters are referenced as indexes in the peaks vector

`halo`

A logical vector specifying for each observation if it is considered part of the halo

`knn_graph`

kNN graph constructed. It is only applicable to the case where coordinates are used as input. Currently it is set as NA.

`nearest_higher_density_neighbor`

index for the nearest sample with higher density. It is only applicable to the case where coordinates are used as input.

`nn.index`

indices for each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.

`nn.dist`

distance to each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.

Before running findClusters the threshold, peaks, clusters and halo data is `NA`.

## Value

A densityCluster object. See details for a description.

## References

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072

## See Also

`estimateDc()`, `findClusters()`

## Examples

 ```1 2 3 4 5 6 7``` ```irisDist <- dist(iris[,1:4]) irisClust <- densityClust(irisDist, gaussian=TRUE) plot(irisClust) # Inspect clustering attributes to define thresholds irisClust <- findClusters(irisClust, rho=2, delta=2) plotMDS(irisClust) split(iris[,5], irisClust\$clusters) ```

thomasp85/densityClust documentation built on May 31, 2019, 11:12 a.m.