optsil: Clustering by Optimizing Silhouette Widths

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/optsil.R

Description

Silhouette width is a measurement of the mean similarity of each object to the other objects in its cluster, compared to its mean similarity to the most similar cluster (see silhouette). Optsil is an iterative re-allocation algorithm to maximize the mean silhouette width of a clustering for a given number of clusters.

Usage

1
optsil(x,dist,maxitr)

Arguments

x

an integer, a vector of integers, an object of class ‘clustering’, ‘partana’, ‘partition’, or ‘stride’

dist

a object of class ‘dist’ from dist, dsvdis, or vegdist

maxitr

the maximum number of iterations to perform

Details

optsil produces a partition, or clustering, of items into clusters by iterative reallocation of items to clusters so as to maximize the mean silhouette width of the classification. At each iteration optsil ranks all possible re-allocations of a item from one cluster to another. The reallocation that maximizes the change in the mean silhouette width is performed. Because silhouette widths are not independent of clusters that are not modified, only a single reallocation can be preformed in a single iteration. When no further re-allocations result in an improvement, or the maximum number of iterations is achieved, the algorithm stops.

Optsil is an unweighted algorithm, i.e. each of the objects is included in the calculation exactly once.

Optsil can be extremely slow to converge, and is best used to ‘polish’ an existing partition or clusterings resulting from slicing an hclust or from functions optpart, pam, diana or other initial clusterings. It is possible to run optsil from a random start, but is EXTREMELY SLOW to converge, and should be done only with caution.

Value

a list with elements:

clustering

a vector of integers giving the cluster assignment for each object

sils

a vector of the silhouette widths achieved at each iteration

numitr

the number of iterations performed

Author(s)

David W. Roberts droberts@montana.edu

See Also

optpart

Examples

1
2
3
4
5
6
data(shoshveg)
dis.bc <- dsvdis(shoshveg,'bray/curtis')
opt.5 <- optpart(5,dis.bc)
sil.5 <- optsil(opt.5,dis.bc,100) # make take a few minutes
summary(silhouette(sil.5,dis.bc))
## Not run: plot(silhouette(sil.5,dis.bc))

Example output

Loading required package: cluster
Loading required package: labdsv
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
Loading required package: MASS

Attaching package: 'labdsv'

The following object is masked from 'package:stats':

    density

Loading required package: plotrix

Attaching package: 'optpart'

The following object is masked from 'package:labdsv':

    clustify

Silhouette of 150 units in 5 clusters from silhouette.default(x = as.numeric(clustify(x)), dist = dist) :
 Cluster sizes and average silhouette widths:
        23         23         42         22         40 
0.26271801 0.01520838 0.13469833 0.15229420 0.21970541 
Individual silhouette widths:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.09471  0.05802  0.14460  0.16126  0.25999  0.42511 

optpart documentation built on March 26, 2020, 6:18 p.m.

Related to optsil in optpart...