figure_of_merit: Figure of Merit Method for Estimating the Predictive Power of...
In ramhiser/clusteval: Evaluation of Clustering Algorithms

Description Usage Arguments Details Value References Examples

We provide an implementation of the Figure of Merit (FOM) stability method proposed by Yeung et al. (2001) for estimating the predictive power of a clustering algorithm. The FOM statistic aggregates the average distance of each observation to its cluster centroid (typically, the cluster mean) after removing each observation in sequence, similar to the jackknife and leave-one-out cross-validation methods. The aggregate FOM score is a function of the average of the aggregated distances corresponding to each sample being removed. For observed FOM scores for clusterings obtained from two distinct clustering algorithms, the smaller score indicates that the corresponding clustering algorithm is preferred and has more predictive power. However, Yeung et al. (2001) note that the FOM statistic can be used only for relative comparisons of clustering algorithms on the same data set for a specified value of K.

1	figure_of_merit(x, K, cluster_method, adjusted = TRUE, ...)

`x`	data matrix with `n` observations (rows) and `p` features (columns)
`K`	the number of clusters to find with the clustering algorithm specified in `cluster_method`
`cluster_method`	a character string or a function specifying the clustering algorithm that will be used. The method specified is matched with the `match.fun` function. The function given should return only clustering labels for each observation in the matrix `x`.
`adjusted`	If specified, the adjusted FOM is calculated.
`...`	additional arguments passed to the function specified in `cluster_method`

We require a clustering algorithm function to be specified in the argument cluster_method. The function given should accept at least two arguments:

x: matrix of observations to cluster
K: the number of clusters to find
...: additional arguments that can be passed on

Also, the function given should return only clustering labels for each observation in the matrix x. The additional arguments specified in ... are useful if a wrapper function is used: see the example below for an illustration.

object of class figure_of_merit, which contains a named list with elements

scores:: vector of length n containing the individual FOM scores for each observation removed in sequence
aggregate:: the aggregate FOM score. This value is adjusted if specified

Yeung K., Haynor D., and Ruzzo W. (2001), Validating Clustering for Gene Expression Data, _Bioinformatics_, 17, 4, 309-318. http://bioinformatics.oxfordjournals.org/content/17/4/309.abstract

## Not run: 
# First, we create a wrapper function for the K-means clustering algorithm
# that returns only the clustering labels for each observation (row) in
# \code{x}.
kmeans_wrapper <- function(x, K, num_starts = 10, ...) {
  kmeans(x = x, centers = K, nstart = num_starts, ...)$cluster
}

# For this example, we generate five multivariate normal populations with the
# \code{sim_data} function.
set.seed(42)
x <- sim_data("normal", delta = 1.5)$x

fom_out <- figure_of_merit(x = x, K = 4, cluster_method = "kmeans_wrapper")
fom_out2 <- figure_of_merit(x = x, K = 5, cluster_method = kmeans_wrapper)

## End(Not run)