Description Usage Arguments Details Value References Examples
We provide an implementation of the Figure of Merit (FOM) stability method
proposed by Yeung et al. (2001) for estimating the predictive power of a
clustering algorithm. The FOM statistic aggregates the average distance of
each observation to its cluster centroid (typically, the cluster mean) after
removing each observation in sequence, similar to the jackknife and
leave-one-out cross-validation methods. The aggregate FOM score is a function
of the average of the aggregated distances corresponding to each sample being
removed. For observed FOM scores for clusterings obtained from two distinct
clustering algorithms, the smaller score indicates that the corresponding
clustering algorithm is preferred and has more predictive power. However,
Yeung et al. (2001) note that the FOM statistic can be used only for relative
comparisons of clustering algorithms on the same data set for a specified
value of K
.
1 | figure_of_merit(x, K, cluster_method, adjusted = TRUE, ...)
|
x |
data matrix with |
K |
the number of clusters to find with the clustering algorithm
specified in |
cluster_method |
a character string or a function specifying the
clustering algorithm that will be used. The method specified is matched with
the |
adjusted |
If specified, the adjusted FOM is calculated. |
... |
additional arguments passed to the function specified in
|
We require a clustering algorithm function to be specified in the argument
cluster_method
. The function given should accept at least two
arguments:
matrix of observations to cluster
the number of clusters to find
additional arguments that can be passed on
Also, the function given should return only clustering labels for each
observation in the matrix x
. The additional arguments specified in
...
are useful if a wrapper function is used: see the example below for
an illustration.
object of class figure_of_merit
, which contains a named list
with elements
vector of length n
containing the individual FOM
scores for each observation removed in sequence
the aggregate FOM score. This value is adjusted
if
specified
Yeung K., Haynor D., and Ruzzo W. (2001), Validating Clustering for Gene Expression Data, _Bioinformatics_, 17, 4, 309-318. http://bioinformatics.oxfordjournals.org/content/17/4/309.abstract
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
# First, we create a wrapper function for the K-means clustering algorithm
# that returns only the clustering labels for each observation (row) in
# \code{x}.
kmeans_wrapper <- function(x, K, num_starts = 10, ...) {
kmeans(x = x, centers = K, nstart = num_starts, ...)$cluster
}
# For this example, we generate five multivariate normal populations with the
# \code{sim_data} function.
set.seed(42)
x <- sim_data("normal", delta = 1.5)$x
fom_out <- figure_of_merit(x = x, K = 4, cluster_method = "kmeans_wrapper")
fom_out2 <- figure_of_merit(x = x, K = 5, cluster_method = kmeans_wrapper)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.