# InitClust: Robust Initialization for Model-based Clustering Methods In otrimle: Robust Model-Based Clustering

## Description

Computes the initial cluster assignment based on a combination of nearest neighbor based noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.

## Usage

 ```1 2``` ``` InitClust(data , G , k = 3 , knnd.trim = 0.5 , modelName='VVV') ```

## Arguments

 `data` A numeric vector, matrix, or data frame of observations. Rows correspond to observations and columns correspond to variables. Categorical variables and `NA` values are not allowed. `G` An integer specifying the number of clusters. `k` An integer specifying the number of considered nearest neighbors per point used for the denoising step (see Details). `knnd.trim` A number in [0,1) which defines the proportion of points initialized as noise. Tipically `knnd.trim<=0.5` (see Details). `modelName` A character string indicating the covariance model to be used. Possible models are: `"E"`: equal variance (one-dimensional) `"V"` : spherical, variable variance (one-dimensional) `"EII"`: spherical, equal volume `"VII"`: spherical, unequal volume `"EEE"`: ellipsoidal, equal volume, shape, and orientation `"VVV"`: ellipsoidal, varying volume, shape, and orientation (default). See Details.

## Details

The initialization is based on Coretto and Hennig (2017). First, wwo steps are performed:

Step 1 (denoising step): for each data point compute its `k`th`-`nearest neighbors distance (`k-`NND). All points with `k-`NND larger than the (1-`knnd.trim`)`-`quantile of the `k-`NND are initialized as noise. Intepretaion of `k` is that: `(k-1)`, but not `k`, points close together may still be interpreted as noise or outliers

Step 2 (clustering step): perform the model-based hierarchical clustering (MBHC) proposed in Fraley (1998). This step is performed using `hc`. The input argument `modelName` is passed to `hc`. See Details of `hc` for more details.

If the previous Step 2 fails to provide `G` clusters each containing at least 2 distinct data points, it is replaced with classical hirararchical clustering implemented in `hclust`. Finally, if `hclust` fails to provide a valid partition, up to ten random partitions are tried.

## Value

An integer vector specifying the initial cluster assignment with `0` denoting noise/outliers.

## References

Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20:270-281.

P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html

## Author(s)

Pietro Coretto pcoretto@unisa.it https://pietro-coretto.github.io

hc

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ``` ## Load Swiss banknotes data data(banknote) x <- banknote[,-1] ## Initial clusters with default arguments init <- InitClust(data = x, G = 2) print(init) ## Perform otrimle a <- otrimle(data = x, G = 2, initial = init, logicd = c(-Inf, -50, -10), ncores = 1) plot(a, what="clustering", data=x) ```

otrimle documentation built on May 29, 2021, 9:09 a.m.