# InitClust: Robust Initialization for Model-based Clustering Methods In otrimle: Robust Model-Based Clustering

## Description

Computes the initial cluster assignment based on a combination of nearest neighbor based noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.

## Usage

 ```1 2``` ``` InitClust(data , G , k = 3 , knnd.trim = 0.5 , modelName='VVV') ```

## Arguments

 `data` A numeric vector, matrix, or data frame of observations. Rows correspond to observations and columns correspond to variables. Categorical variables and `NA` values are not allowed. `G` An integer specifying the number of clusters. `k` An integer specifying the number of considered nearest neighbors per point used for the denoising step (see Details). `knnd.trim` A number in (0,1) which defines the proportion of points initialized as noise. Tipically `knnd.trim<=0.5` (see Details). `modelName` A character string indicating the covariance model to be used. Possible models are: `"E"`: equal variance (one-dimensional) `"V"` : spherical, variable variance (one-dimensional) `"EII"`: spherical, equal volume `"VII"`: spherical, unequal volume `"EEE"`: ellipsoidal, equal volume, shape, and orientation `"VVV"`: ellipsoidal, varying volume, shape, and orientation (default). See Details.

## Details

The initialization is discussed in details in Coretto and Hennig (2016). Two steps are performed:

Denoising step: for each data point compute its `k`th`-`nearest neighbors distance (`k-`NND). All points with `k-`NND larger than the (1-`knnd.trim`)`-`quantile of the `k-`NND are initialized as noise. Intepretaion of `k` is that: `(k-1)`, but not `k`, points close together may still be interpreted as noise or outliers

Clustering step: perform the model-based hierarchical clustering (MBHC) proposed in Fraley (1998). This step is performed using `hc`. The input argument `modelName` is passed to `hc`. See Details of `hc` for more details.

## Value

An integer vector specifying the initial cluster assignment with `0` denoting noise/outliers.

## References

Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20:270-281.

Coretto, P. and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. arXiv preprint available at arXiv:1309.6895.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ``` ## Load Swiss banknotes data data(banknote) x <- banknote[,-1] ## Initial clusters with default arguments init <- InitClust(data = x, G = 2) print(init) ## Perform otrimle a <- otrimle(data = x, G = 2, initial = init, logicd = c(-Inf, -50, -10), ncores = 1) plot(a, what="clustering", data=x) ```