figure_of_merit: Figure of Merit Method for Estimating the Predictive Power of...

Description Usage Arguments Details Value References Examples

Description

We provide an implementation of the Figure of Merit (FOM) stability method proposed by Yeung et al. (2001) for estimating the predictive power of a clustering algorithm. The FOM statistic aggregates the average distance of each observation to its cluster centroid (typically, the cluster mean) after removing each observation in sequence, similar to the jackknife and leave-one-out cross-validation methods. The aggregate FOM score is a function of the average of the aggregated distances corresponding to each sample being removed. For observed FOM scores for clusterings obtained from two distinct clustering algorithms, the smaller score indicates that the corresponding clustering algorithm is preferred and has more predictive power. However, Yeung et al. (2001) note that the FOM statistic can be used only for relative comparisons of clustering algorithms on the same data set for a specified value of K.

Usage

1
figure_of_merit(x, K, cluster_method, adjusted = TRUE, ...)

Arguments

x

data matrix with n observations (rows) and p features (columns)

K

the number of clusters to find with the clustering algorithm specified in cluster_method

cluster_method

a character string or a function specifying the clustering algorithm that will be used. The method specified is matched with the match.fun function. The function given should return only clustering labels for each observation in the matrix x.

adjusted

If specified, the adjusted FOM is calculated.

...

additional arguments passed to the function specified in cluster_method

Details

We require a clustering algorithm function to be specified in the argument cluster_method. The function given should accept at least two arguments:

x

matrix of observations to cluster

K

the number of clusters to find

...

additional arguments that can be passed on

Also, the function given should return only clustering labels for each observation in the matrix x. The additional arguments specified in ... are useful if a wrapper function is used: see the example below for an illustration.

Value

object of class figure_of_merit, which contains a named list with elements

scores:

vector of length n containing the individual FOM scores for each observation removed in sequence

aggregate:

the aggregate FOM score. This value is adjusted if specified

References

Yeung K., Haynor D., and Ruzzo W. (2001), Validating Clustering for Gene Expression Data, _Bioinformatics_, 17, 4, 309-318. http://bioinformatics.oxfordjournals.org/content/17/4/309.abstract

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
# First, we create a wrapper function for the K-means clustering algorithm
# that returns only the clustering labels for each observation (row) in
# \code{x}.
kmeans_wrapper <- function(x, K, num_starts = 10, ...) {
  kmeans(x = x, centers = K, nstart = num_starts, ...)$cluster
}

# For this example, we generate five multivariate normal populations with the
# \code{sim_data} function.
set.seed(42)
x <- sim_data("normal", delta = 1.5)$x

fom_out <- figure_of_merit(x = x, K = 4, cluster_method = "kmeans_wrapper")
fom_out2 <- figure_of_merit(x = x, K = 5, cluster_method = kmeans_wrapper)

## End(Not run)

ramhiser/clusteval documentation built on May 26, 2019, 10:07 p.m.