Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/clValid-functions.R
Calculates the stability measures the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM), and the figure of merit (FOM).
1 |
mat |
The data matrix of the clustered observations |
Dist |
The distance matrix (as a matrix or dist object) of the
clustered observations. If NULL then |
del |
An integer indicating which column was removed |
cluster |
An integer vector indicating the cluster partitioning based on all the data |
clusterDel |
An integer vector indicating the cluster
partitioning based on the data with column |
method |
The metric used to determine the distance
matrix. Not used if |
The stability measures evaluate the stability of a clustering result by comparing it with the clusters obtained by removing one column at a time. These measures include the average proportion of non-overlap (APN), the average distance (AD), the average distance between means (ADM), and the figure of merit (FOM). The APN, AD, and ADM are all based on the cross-classification table of the original clustering with the clustering based on the removal of one column. The APN measures the average proportion of observations not placed in the same cluster under both cases, while the AD measures the average distance between observations placed in the same cluster under both cases and the ADM measures the average distance between cluster centers for observations placed in the same cluster under both cases. The FOM measures the average intra-cluster variance of the deleted column, where the clustering is based on the remaining (undeleted) columns. In all cases the average is taken over all the deleted columns, and all measures should be minimized. For details see the package vignette.
NOTE: The stability
function only calculates these measures for
the particular column specified by del
removed. To get the
overall scores, the user must average the measures
corresponding to each removed column.
Returns a numeric vector with the APN, AD, ADM, and FOM measures corresponding to the particular column that was removed.
The main function for cluster validation is clValid
, and
users should call this function directly if possible.
To get the overall values, the stability measures corresponding to each removed column should be averaged (see the examples below).
Guy Brock, Vasyl Pihur, Susmita Datta, Somnath Datta
Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19(4): 459-466.
For a description of the function 'clValid' see clValid
.
For a description of the class 'clValid' and all available methods see
clValidObj
or clValid-class
.
For additional help on the other validation measures see
connectivity
, dunn
,
BSI
, and
BHI
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | data(mouse)
express <- mouse[1:25,c("M1","M2","M3","NC1","NC2","NC3")]
rownames(express) <- mouse$ID[1:25]
## hierarchical clustering
Dist <- dist(express,method="euclidean")
clusterObj <- hclust(Dist, method="average")
nc <- 4 ## number of clusters
cluster <- cutree(clusterObj,nc)
stab <- matrix(0,nrow=ncol(express),ncol=4)
colnames(stab) <- c("APN","AD","ADM","FOM")
## Need loop over all removed samples
for (del in 1:ncol(express)) {
matDel <- express[,-del]
DistDel <- dist(matDel,method="euclidean")
clusterObjDel <- hclust(DistDel, method="average")
clusterDel <- cutree(clusterObjDel,nc)
stab[del,] <- stability(express, Dist, del, cluster, clusterDel)
}
colMeans(stab)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.