HDDClustering: HDD clustering is a model-based clustering method of...

View source: R/HDDClustering.R

HDDClusteringR Documentation

HDD clustering is a model-based clustering method of [Bouveyron et al., 2007].

Description

HDD clustering is based on the Gaussian Mixture Model and on the idea that the data lives in subspaces with a lower dimension than the dimension of the original space. It uses the EM algorithm to estimate the parameters of the model [Berge et al., 2012].

Usage

HDDClustering(Data, ClusterNo, PlotIt=F,...)

Arguments

Data

[1:n,1:d] matrix of dataset to be clustered. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features.

ClusterNo

Optional, Numeric indicating either the number of cluster or a vector of 1:k to indicate the maximal expected number of clusters.

PlotIt

(optional) Boolean. Default = FALSE = No plotting performed.

...

Further arguments to be set for the clustering algorithm, if not set, default arguments are used, see hddc for details.

Details

HDD clustering maximises the BIC criterion for a range of possible number of cluster up to ClusterNo. Per default the most general model is used, alternetively the parameter model="ALL" can be used to evaluate all possible models with BIC [Berge et al., 2012]. If specific properties of Data are known priorly please see hddc for specific model selection.

Value

List of

Cls

[1:n] numerical vector with n numbers defining the classification as the main output of the clustering algorithm. It has k unique numbers representing the arbitrary labels of the clustering.

Object

Object defined by clustering algorithm as the other output of this algorithm

Author(s)

Quirin Stier

References

[Berge et al., 2012] L. Berge, C. Bouveyron and S. Girard, HDclassif: an R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data, Journal of Statistical Software, vol. 42 (6), pp. 1-29, 2012.

[Bouveyron et al., 2007] Bouveyron, C. Girard, S. and Schmid, C: High-Dimensional Data Clustering, Computational Statistics and Data Analysis, vol. 52 (1), pp. 502-519, 2007.

Examples

# Hepta
data("Hepta")
Data = Hepta$Data
#Non-default parameter model
#can be set to evaulate all possible models
V = HDDClustering(Data=Data,ClusterNo=7,model="ALL")
Cls = V$Cls

ClusterAccuracy(Hepta$Cls, Cls)

## Not run: 
library(HDclassif)
data(Crabs)
Data = Crabs[,-1]
V = HDDClustering(Data=Data,ClusterNo=4,com_dim=1)

## End(Not run)

FCPS documentation built on Oct. 19, 2023, 5:06 p.m.