Random.kmeans.validity: k-means clustering and validity indices computation using...

Random.kmeans.validityR Documentation

k-means clustering and validity indices computation using random projections of data

Description

This function applies a k-means clustering algorithm to the data and then computes stability indices for the obtained cluster using multiple random subspace projections. It computes the validity indices for each cluster found in the original space, the overall validity index for the clustering and (optionally) the set of the AC indices. Different randomized maps (e.g. PMO, Achlioptas, Normal, Random Subspace projections) may be applied. It assumes that the label of the examples are integer starting from 1 to ncol(M). Note that the k-means algorithm strongly depends from the initial conditions. Hence choosing different random seed we may obtain different results; setting seed=-1 (default) each time a different random seed is chosen.

Usage

Random.kmeans.validity(M, dim, pmethod = "PMO", c = 3, it.max = 1000, 
                       n = 50, scale = TRUE, seed = -1, AC = TRUE)

Arguments

M

matrix of data: rows are variables and columns are examples

dim

subspace dimension

pmethod

projection method. It must be one of the following: "RS" (random subspace projection) "PMO" (Plus Minus One random projection) "Norm" (normal random projection) "Achlioptas" (Achlioptas random projection)

c

number of clusters

it.max

maximum number of iteration of the k-means algorithm (default 1000)

n

number of random projections

scale

if TRUE (default) the random projections are scaled

seed

numerical seed for the random generator

AC

if TRUE (default) the AC indices are computed.

Value

a list with esixight components: "validity", "overall.validity", "similarity.matrix", "dim", "cluster", "orig.cluster":

validity

a vector with the validity of each of the c clusters

overall.validity

validity index of the overall clustering

similarity.matrix

pairwise similarity matrix between examples

dimension

random projection dimension

cluster

is the list of the n clustering obtained by multiple k-means clustering on the projected subspace

orig.cluster

list of the clusters in the original space

AC

matrix with the Assignment Confidence index for each example. Each row corresponds to an example, each column to a cluster (optional)

Author(s)

Giorgio Valentini valentini@di.unimi.it

See Also

Achlioptas.random.projection, Plus.Minus.One.random.projection,

norm.random.projection,random.subspace,

Cluster.validity, Validity.indices, AC.index

Examples

# Assessment of the reliability of clusters discovered 
# by k-means using RS projections. 
M <- generate.sample0(n=10, m=2, sigma=2, dim=800)
l<-Random.kmeans.validity(M, dim=30, pmethod = "RS", c = 3,  n = 20)
# The same as above, but using PMO projections. 
l<-Random.kmeans.validity(M, dim=30, pmethod = "PMO", c = 3, n = 20)
# The same as above, but evaluating clusterings with 5 clusters 
l<-Random.kmeans.validity(M, dim=30, pmethod = "PMO", c = 5, n = 20)
# The same as above, but evaluating clusterings with 10 clusters 
l<-Random.kmeans.validity(M, dim=30, pmethod = "PMO", c = 10, n = 20)
# Assessment of the reliability of the clusters using projections 
# with limited distortion (max. 
# expansion lower than 1.3 according to the Johnson Lindenstrauss lemma)
d <- JL.predict.dim(n=30, epsilon=0.3)
l<-Random.kmeans.validity(M, dim=d, pmethod = "PMO", c = 3, n = 20)


clusterv documentation built on June 8, 2025, 10:21 a.m.