ClustObj: Calculate clustering attributes based on the DensityPeak...

Description Usage Arguments Details Value References See Also Examples

View source: R/weightedClustSuite.R

Description

This function takes a distance matrix and optionally a distance cutoff and calculates the values necessary for clustering based on the algorithm proposed by Alex Rodrigues and Alessandro Laio (see references). The actual assignment to clusters are done in a later step, based on user defined threshold values. If a distance matrix is passed into distance the original algorithm described in the paper is used. If a matrix or data.frame is passed instead it is interpretted as point coordinates and rho will be estimated based on k-nearest neighbors of each point (rho is estimated as exp(-mean(x)) where x is the distance to the nearest neighbors). This can be useful when data is so large that calculating the full distance matrix can be prohibitive.

Usage

1
ClustObj(orig, weights, distance, dc, gaussian = FALSE, verbose = FALSE, ...)

Arguments

distance

A distance matrix or a matrix (or data.frame) for the coordinates of the data. If a matrix or data.frame is used the distances and local density will be estimated using a fast k-nearest neighbor approach.

dc

A distance cutoff for calculating the local density. If missing it will be estimated with DensityPeakEstimateDc(distance)

gaussian

Logical. Should a gaussian kernel be used to estimate the density (defaults to FALSE)

verbose

Logical. Should the running details be reported

...

Additional parameters passed on to get.knn

Details

The function calculates rho and delta for the observations in the provided distance matrix. If a distance cutoff is not provided this is first estimated using DensityPeakEstimateDc() with default values.

The information kept in the Clustering***c object is:

rho

A vector of local density values

delta

A vector of minimum distances to observations of higher density

distance

The initial distance matrix

dc

The distance cutoff used to calculate rho

threshold

A named vector specifying the threshold values for rho and delta used for cluster detection

peaks

A vector of indexes specifying the cluster center for each cluster

clusters

A vector of cluster affiliations for each observation. The clusters are referenced as indexes in the peaks vector

halo

A logical vector specifying for each observation if it is considered part of the halo

knn_graph

kNN graph constructed. It is only applicable to the case where coordinates are used as input. Currently it is set as NA.

nearest_higher_density_neighbor

index for the nearest sample with higher density. It is only applicable to the case where coordinates are used as input.

nn.index

indices for each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.

nn.dist

distance to each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.

Before running findDensityPeakClusters the threshold, peaks, clusters and halo data is NA.

Value

A Clustering*** object. See details for a description.

References

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072

See Also

DensityPeakEstimateDc(), findDensityPeakClusters()

Examples

1
2
3
4
5
6
7
irisDist <- dist(iris[,1:4])
irisClust <- ClustObj(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds

irisClust <- findDensityPeakClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)

DhanujG/weightedClustSuite documentation built on March 3, 2021, 12:29 a.m.