densityClust: Calculate clustering attributes based on the densityClust...

Description Usage Arguments Details Value References See Also Examples

Description

This function takes a distance matrix and optionally a distance cutoff and calculates the values necessary for clustering based on the algorithm proposed by Alex Rodrigues and Alessandro Laio (see references). The actual assignment to clusters are done in a later step, based on user defined threshold values. If a distance matrix is passed into distance the original algorithm described in the paper is used. If a matrix or data.frame is passed instead it is interpretted as point coordinates and rho will be estimated based on k-nearest neighbors of each point (rho is estimated as exp(-mean(x)) where x is the distance to the nearest neighbors). This can be useful when data is so large that calculating the full distance matrix can be prohibitive.

Usage

1
densityClust(distance, dc, gaussian = FALSE, verbose = FALSE, ...)

Arguments

distance

A distance matrix or a matrix (or data.frame) for the coordinates of the data. If a matrix or data.frame is used the distances and local density will be estimated using a fast k-nearest neighbor approach.

dc

A distance cutoff for calculating the local density. If missing it will be estimated with estimateDc(distance)

gaussian

Logical. Should a gaussian kernel be used to estimate the density (defaults to FALSE)

verbose

Logical. Should the running details be reported

...

Additional parameters passed on to get.knn

Details

The function calculates rho and delta for the observations in the provided distance matrix. If a distance cutoff is not provided this is first estimated using estimateDc() with default values.

The information kept in the densityCluster object is:

rho

A vector of local density values

delta

A vector of minimum distances to observations of higher density

distance

The initial distance matrix

dc

The distance cutoff used to calculate rho

threshold

A named vector specifying the threshold values for rho and delta used for cluster detection

peaks

A vector of indexes specifying the cluster center for each cluster

clusters

A vector of cluster affiliations for each observation. The clusters are referenced as indexes in the peaks vector

halo

A logical vector specifying for each observation if it is considered part of the halo

knn_graph

kNN graph constructed. It is only applicable to the case where coordinates are used as input. Currently it is set as NA.

nearest_higher_density_neighbor

index for the nearest sample with higher density. It is only applicable to the case where coordinates are used as input.

nn.index

indices for each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.

nn.dist

distance to each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.

Before running findClusters the threshold, peaks, clusters and halo data is NA.

Value

A densityCluster object. See details for a description.

References

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072

See Also

estimateDc(), findClusters()

Examples

1
2
3
4
5
6
7
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds

irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)

thomasp85/densityClust documentation built on May 31, 2019, 11:12 a.m.