This function takes a distance matrix and optionally a distance cutoff and
calculates the values necessary for clustering based on the algorithm
proposed by Alex Rodrigues and Alessandro Laio (see references). The actual
assignment to clusters are done in a later step, based on user defined
threshold values. If a distance matrix is passed into
original algorithm described in the paper is used. If a matrix or data.frame
is passed instead it is interpretted as point coordinates and rho will be
estimated based on k-nearest neighbors of each point (rho is estimated as
x is the distance to the nearest
neighbors). This can be useful when data is so large that calculating the
full distance matrix can be prohibitive.
A distance matrix or a matrix (or data.frame) for the coordinates of the data. If a matrix or data.frame is used the distances and local density will be estimated using a fast k-nearest neighbor approach.
A distance cutoff for calculating the local density. If missing it
will be estimated with
Logical. Should a gaussian kernel be used to estimate the density (defaults to FALSE)
Logical. Should the running details be reported
Additional parameters passed on to get.knn
The function calculates rho and delta for the observations in the provided
distance matrix. If a distance cutoff is not provided this is first estimated
estimateDc() with default values.
The information kept in the densityCluster object is:
A vector of local density values
A vector of minimum distances to observations of higher density
The initial distance matrix
The distance cutoff used to calculate rho
A named vector specifying the threshold values for rho and delta used for cluster detection
A vector of indexes specifying the cluster center for each cluster
A vector of cluster affiliations for each observation. The clusters are referenced as indexes in the peaks vector
A logical vector specifying for each observation if it is considered part of the halo
kNN graph constructed. It is only applicable to the case where coordinates are used as input. Currently it is set as NA.
index for the nearest sample with higher density. It is only applicable to the case where coordinates are used as input.
indices for each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.
distance to each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.
Before running findClusters the threshold, peaks, clusters and halo data is
A densityCluster object. See details for a description.
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072
1 2 3 4 5 6 7
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.