tuneclus: Cluster quality assessment for a range of clusters and...

View source: R/tuneclus.R

tuneclusR Documentation

Cluster quality assessment for a range of clusters and dimensions.

Description

This function facilitates the selection of the appropriate number of clusters and dimensions for joint dimension reduction and clustering methods.

Usage

tuneclus(data, nclusrange = 3:4, ndimrange = 2:3, 
method = c("RKM","FKM","mixedRKM","mixedFKM","clusCA","iFCB","MCAk"), 
criterion = "asw", dst = "full", alpha = NULL, alphak = NULL, 
center = TRUE, scale = TRUE, rotation = "none", nstart = 100, 
smartStart = NULL, seed = NULL)

## S3 method for class 'tuneclus'
print(x, ...)

## S3 method for class 'tuneclus'
summary(object, ...)

## S3 method for class 'tuneclus'
fitted(object, mth = c("centers", "classes"), ...)

Arguments

data

Continuous, Categorical ot Mixed data set

nclusrange

An integer vector with the range of numbers of clusters which are to be compared by the cluster validity criteria. Note: the number of clusters should be greater than one

ndimrange

An integer vector with the range of dimensions which are to be compared by the cluster validity criteria

method

Specifies the method. Options are RKM for reduced K-means, FKM for factorial K-means, mixedRKM for mixed reduced K-means, mixedFKM for mixed factorial K-means, MCAk for MCA K-means, iFCB for Iterative Factorial Clustering of Binary variables and clusCA for Cluster Correspondence Analysis

criterion

One of asw, ch or crit. Determines whether average silhouette width, Calinski-Harabasz index or objective value of the selected method is used (default = "asw")

dst

Specifies the data used to compute the distances between objects. Options are full for the original data (after possible scaling) and low for the object scores in the low-dimensional space (default = "full")

alpha

Adjusts for the relative importance of (mixed) RKM and FKM in the objective function; alpha = 1 reduces to PCA, alpha = 0.5 to (mixed) reduced K-means, and alpha = 0 to (mixed) factorial K-means

alphak

Non-negative scalar to adjust for the relative importance of MCA (alphak = 1) and K-means (alphak = 0) in the solution (default = .5). Works only in combination with method = "MCAk"

center

A logical value indicating whether the variables should be shifted to be zero centered (default = TRUE)

scale

A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = TRUE)

rotation

Specifies the method used to rotate the factors. Options are none for no rotation, varimax for varimax rotation with Kaiser normalization and promax for promax rotation (default = "none")

nstart

Number of starts (default = 100)

smartStart

If NULL then a random cluster membership vector is generated. Alternatively, a cluster membership vector can be provided as a starting solution

seed

An integer that is used as argument by set.seed() for offsetting the random number generator when smartStart = NULL. The default value is NULL.

x

For the print method, a class of clusmca

object

For the summary method, a class of clusmca

mth

For the fitted method, a character string that specifies the type of fitted value to return: "centers" for the observations center vector, or "class" for the observations cluster membership value

...

Not used

Details

For the K-means part, the algorithm of Hartigan-Wong is used by default.

The hidden print and summary methods print out some key components of an object of class tuneclus.

The hidden fitted method returns cluster fitted values. If method is "classes", this is a vector of cluster membership (the cluster component of the "tuneclus" object). If method is "centers", this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.

Value

clusobjbest

The output of the optimal run of cluspca() or clusmca()

nclusbest

The optimal number of clusters

ndimbest

The optimal number of dimensions

critbest

The optimal criterion value for nclusbest clusters and ndimbest dimensions

critgrid

Matrix of size nclusrange x ndimrange with the criterion values for the specified ranges of clusters and dimensions (values are calculated only when the number of clusters is greater than the number of dimensions; otherwise values in the grid are left blank)

criterion

"asw" for average Silhouette width or "ch" for "Calinski-Harabasz"

cluasw

Average Silhouette width values of each cluster, when criterion = "asw"

References

Calinski, R.B., and Harabasz, J., (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1-27.

Kaufman, L., and Rousseeuw, P.J., (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

See Also

global_bootclus, local_bootclus

Examples

# Reduced K-means for a range of clusters and dimensions
data(macro)
# Cluster quality assessment based on the average silhouette width in the low dimensional space
# nstart = 1 for speed in example
# use more for real applications
bestRKM = tuneclus(macro, 3:4, 2:3, method = "RKM", 
criterion = "asw", dst = "low", nstart = 1, seed = 1234)
bestRKM
#plot(bestRKM)

# Cluster Correspondence Analysis for a range of clusters and dimensions
data(bribery)
# Cluster quality assessment based on the Callinski-Harabasz index in the full dimensional space
bestclusCA = tuneclus(bribery, 4:5, 3:4, method = "clusCA",
criterion = "ch", nstart = 20, seed = 1234)
bestclusCA
#plot(bestclusCA, cludesc = TRUE)

# Mixed reduced K-means for a range of clusters and dimensions
data(diamond)
# Cluster quality assessment based on the average silhouette width in the low dimensional space
# nstart = 5 for speed in example
# use more for real applications
bestmixedRKM = tuneclus(diamond[,-7], 3:4, 2:3, 
method = "mixedRKM", criterion = "asw", dst = "low", 
nstart = 5, seed = 1234)
bestmixedRKM
#plot(bestmixedRKM)

clustrd documentation built on July 17, 2022, 1:05 a.m.

Related to tuneclus in clustrd...