Computes the initial cluster assignment based on a combination of nearest neighbor based clutter/noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.

1 2 3 |

`data` |
A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
variables and |

`G` |
An integer specifying the number of clusters. |

`cpr.min` |
The minimum cluster proportion allowed in the initial clustering. |

`K` |
An integer specifying the number of considered nearest neighbors per point
used for the denoising step (see |

`nstart.km` |
An integer specifying the number of random starts for the k-means step. |

`modelName` |
A character string indicating the covariance model to be used. Possible models are: |

`monitor` |
A logical value; |

The initialization is described in the supplementary material of
Coretto and Hennig (2015). Noise/outliers are removed based on nearest neighbor based clutter/noise
detection (NNC) of Byers and Raftery (1998). This step is performed
with `NNclean`

. The input argument `K`

is passed as `k`

to
`NNclean`

. Based on
this step a denoised version of `data`

is obtained. The initial
clustering is then obtained based on the following steps. Note
that these steps are reported in the `code`

element of the output
list (see *Value*).

Clustering steps:

*Step 1*: perform the model-based hierarchical clustering (MBHC)
proposed in Fraley (1998). This step is performed using
`hc`

. The input argument `modelName`

is passed
to `hc`

. See *Details* of
`hc`

for more details.

*Step 2*: if too small clusters (cluster proportions
`<cpr.min`

) are found in the previous step, assign small clusters
to noise and perform MBHC again on the denoised data.

*Step 3*: if too small clusters are found in the previous step,
assign small clusters to noise and perform k-means on the denoised data.

*Step 4*: if too small clusters are found in the previous step, then a
completely random partition that satisfies `cpr.min`

is returned.

A `list`

with the following components:

`code` |
An integer indicating the step at which the initial clustering has been
found (see |

`cluster` |
A vector of integers denoting cluster assignments for each
observation. |

Fraley, C. (1998).
Algorithms for model-based Gaussian hierarchical clustering.
*SIAM Journal on Scientific Computing* 20:270-281.

Byers, S. and A. E. Raftery (1998).
Nearest-Neighbor Clutter Removal for Estimating Features in Spatial
Point Processes,
*Journal of the American Statistical Association*, 93, 577-584.

Coretto, P. and C. Hennig (2015).
Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering.
To appear on the *Journal of the American Statistical Association*.
arXiv preprint at arXiv:1406.0808
with (supplement).

NNclean, hc

1 2 3 4 5 6 7 8 9 10 11 12 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.