stability: Stability Measures In clValid: Validation of Clustering Results

Description

Calculates the stability measures the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM), and the figure of merit (FOM).

Usage

 `1` ```stability(mat, Dist=NULL, del, cluster, clusterDel, method="euclidean") ```

Arguments

 `mat` The data matrix of the clustered observations `Dist` The distance matrix (as a matrix or dist object) of the clustered observations. If NULL then `method` is used with `mat` to determine the distance matrix. `del` An integer indicating which column was removed `cluster` An integer vector indicating the cluster partitioning based on all the data `clusterDel` An integer vector indicating the cluster partitioning based on the data with column `del` removed. `method` The metric used to determine the distance matrix. Not used if `distance` is provided.

Details

The stability measures evaluate the stability of a clustering result by comparing it with the clusters obtained by removing one column at a time. These measures include the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM), and the figure of merit (FOM). The APN, AD, and ADM are all based on the cross-classification table of the original clustering with the clustering based on the removal of one column. The APN measures the average proportion of observations not placed in the same cluster under both cases, while the AD measures the average distance between observations placed in the same cluster under both cases and the ADM measures the average distance between cluster centers for observations placed in the same cluster under both cases. The FOM measures the average intra-cluster variance of the deleted column, where the clustering is based on the remaining (undeleted) columns. In all cases the average is taken over all the deleted columns, and all measures should be minimized. For details see the package vignette.

NOTE: The `stability` function only calculates these measures for the particular column specified by `del` removed. To get the overall scores, the user must average the measures corresponding to each removed column.

Value

Returns a numeric vector with the APN, AD, ADM, and FOM measures corresponding to the particular column that was removed.

Note

The main function for cluster validation is `clValid`, and users should call this function directly if possible.

To get the overall values, the stability measures corresponding to each removed column should be averaged (see the examples below).

Author(s)

Guy Brock, Vasyl Pihur, Susmita Datta, Somnath Datta

References

Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19(4): 459-466.

For a description of the function 'clValid' see `clValid`.

For a description of the class 'clValid' and all available methods see `clValidObj` or `clValid-class`.

For additional help on the other validation measures see `connectivity`, `dunn`, `BSI`, and `BHI`.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```data(mouse) express <- mouse[1:25,c("M1","M2","M3","NC1","NC2","NC3")] rownames(express) <- mouse\$ID[1:25] ## hierarchical clustering Dist <- dist(express,method="euclidean") clusterObj <- hclust(Dist, method="average") nc <- 4 ## number of clusters cluster <- cutree(clusterObj,nc) stab <- matrix(0,nrow=ncol(express),ncol=4) colnames(stab) <- c("APN","AD","ADM","FOM") ## Need loop over all removed samples for (del in 1:ncol(express)) { matDel <- express[,-del] DistDel <- dist(matDel,method="euclidean") clusterObjDel <- hclust(DistDel, method="average") clusterDel <- cutree(clusterObjDel,nc) stab[del,] <- stability(express, Dist, del, cluster, clusterDel) } colMeans(stab) ```

clValid documentation built on May 29, 2017, 9:36 a.m.