stability: Stability Measures
In clValid: Validation of Clustering Results

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Calculates the stability measures the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM), and the figure of merit (FOM).

1	stability(mat, Dist=NULL, del, cluster, clusterDel, method="euclidean")

`mat`	The data matrix of the clustered observations
`Dist`	The distance matrix (as a matrix or dist object) of the clustered observations. If NULL then `method` is used with `mat` to determine the distance matrix.
`del`	An integer indicating which column was removed
`cluster`	An integer vector indicating the cluster partitioning based on all the data
`clusterDel`	An integer vector indicating the cluster partitioning based on the data with column `del` removed.
`method`	The metric used to determine the distance matrix. Not used if `distance` is provided.

The stability measures evaluate the stability of a clustering result by comparing it with the clusters obtained by removing one column at a time. These measures include the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM), and the figure of merit (FOM). The APN, AD, and ADM are all based on the cross-classification table of the original clustering with the clustering based on the removal of one column. The APN measures the average proportion of observations not placed in the same cluster under both cases, while the AD measures the average distance between observations placed in the same cluster under both cases and the ADM measures the average distance between cluster centers for observations placed in the same cluster under both cases. The FOM measures the average intra-cluster variance of the deleted column, where the clustering is based on the remaining (undeleted) columns. In all cases the average is taken over all the deleted columns, and all measures should be minimized. For details see the package vignette.

NOTE: The stability function only calculates these measures for the particular column specified by del removed. To get the overall scores, the user must average the measures corresponding to each removed column.

Returns a numeric vector with the APN, AD, ADM, and FOM measures corresponding to the particular column that was removed.

The main function for cluster validation is clValid, and users should call this function directly if possible.

To get the overall values, the stability measures corresponding to each removed column should be averaged (see the examples below).

Guy Brock, Vasyl Pihur, Susmita Datta, Somnath Datta

Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19(4): 459-466.

For a description of the function 'clValid' see clValid.

For a description of the class 'clValid' and all available methods see clValidObj or clValid-class.

For additional help on the other validation measures see connectivity, dunn, BSI, and BHI.

data(mouse)
express <- mouse[1:25,c("M1","M2","M3","NC1","NC2","NC3")]
rownames(express) <- mouse$ID[1:25]
## hierarchical clustering
Dist <- dist(express,method="euclidean")
clusterObj <- hclust(Dist, method="average")
nc <- 4 ## number of clusters      
cluster <- cutree(clusterObj,nc)

stab <- matrix(0,nrow=ncol(express),ncol=4)
colnames(stab) <- c("APN","AD","ADM","FOM")

## Need loop over all removed samples
for (del in 1:ncol(express)) {
  matDel <- express[,-del]               
  DistDel <- dist(matDel,method="euclidean")
  clusterObjDel <- hclust(DistDel, method="average")
  clusterDel <- cutree(clusterObjDel,nc)
  stab[del,] <- stability(express, Dist, del, cluster, clusterDel)
}
colMeans(stab)