approxSilhouette: Approximate silhouette width

View source: R/approxSilhouette.R

approxSilhouetteR Documentation

Approximate silhouette width

Description

Given a clustering, quickly compute an approximate silhouette width for each observation.

Usage

approxSilhouette(x, clusters)

Arguments

x

A numeric matrix-like object containing observations in rows and variables in columns.

clusters

Vector of length equal to ncol(x), specifying the cluster assigned to each observation.

Details

The silhouette width is a general-purpose method for evaluating the separation between clusters but requires calculating the average distance between pairs of observations within or between clusters. This function instead approximates the average distance with the root-mean-squared-distance, which can be computed very efficiently for large datasets. The approximated averages are then used to compute the silhouette width using the usual definition.

Value

A DataFrame with one row per observation in x and the columns:

  • cluster, the assigned cluster for each observation in x.

  • other, the closest cluster other than the one to which the current observation is assigned.

  • width, a numeric field containing the approximate silhouette width of the current cell.

Row names are defined as the row names of x.

Author(s)

Aaron Lun

See Also

silhouette from the cluster package, for the exact calculation.

neighborPurity, for another method of evaluating cluster separation.

Examples

m <- matrix(rnorm(10000), ncol=10)
clusters <- clusterRows(m, BLUSPARAM=KmeansParam(5))
out <- approxSilhouette(m, clusters)
boxplot(split(out$width, clusters))

# Mocking up a stronger example:
centers <- matrix(rnorm(30), nrow=3)
clusters <- sample(1:3, 1000, replace=TRUE)

y <- centers[clusters,]
y <- y + rnorm(length(y), sd=0.1)

out2 <- approxSilhouette(y, clusters)
boxplot(split(out2$width, clusters))


LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.