# approxSilhouette: Approximate silhouette width In bluster: Clustering Algorithms for Bioconductor

## Description

Given a clustering, compute a fast approximate silhouette width for each cell.

## Usage

 1 approxSilhouette(x, clusters) 

## Arguments

 x A numeric matrix-like object containing observations in rows and variables in columns. clusters Vector of length equal to ncol(x), specifying the cluster assigned to each observation.

## Details

The silhouette width is a general-purpose method for evaluating the separation between clusters but requires calculating the average distance between pairs of observations within or between clusters. This function instead approximates the average distances for faster computation in large datasets.

For a given observation, let \tilde D be the approximate average distance to all cells in cluster X. This is defined as the square root of the sum of:

• The squared distance from the current observation to the centroid of cluster X. This is most accurate when the observation is distant to X relative to the latter's variation.

• The summed variance of all variables across observations in cluster X. This is most accurate when the observation lies close to the close to the centroid of X.

This is also equivalent to the root-square-mean distance from the current observation to all cells in X.

The approximate silhouette width for each cell can then be calculated with the relevant two values of \tilde D, computed by setting X to the cluster of the current cell or the closest other cluster.

## Value

A DataFrame with one row per cell in x and the columns:

• width, a numeric field containing the approximate silhouette width of the current cell.

• other, the closest cluster other than the one to which the current cell is assigned.

Row names are defined as the row names of x.

## Author(s)

Aaron Lun

silhouette from the cluster package, for the exact calculation.
neighborPurity, for another method of evaluating cluster separation.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 m <- matrix(rnorm(10000), ncol=10) clusters <- clusterRows(m, BLUSPARAM=KmeansParam(5)) out <- approxSilhouette(m, clusters) boxplot(split(out$width, clusters)) # Mocking up a stronger example: centers <- matrix(rnorm(30), nrow=3) clusters <- sample(1:3, 1000, replace=TRUE) y <- centers[clusters,] y <- y + rnorm(length(y), sd=0.1) out2 <- approxSilhouette(y, clusters) boxplot(split(out2$width, clusters))