Robust Initilization for Model-based Clustering Methods
Computes the initial cluster assignment based on a combination of nearest neighbor based clutter/noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.
1 2 3
A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
An integer specifying the number of clusters.
The minimum cluster proportion allowed in the initial clustering.
An integer specifying the number of considered nearest neighbors per point used for the denoising step (see Details).
An integer specifying the number of random starts for the k-means step.
A character string indicating the covariance model to be used. Possible models are:
A logical value;
The initialization is described in the supplementary material of
Coretto and Hennig (2015). Noise/outliers are removed based on nearest neighbor based clutter/noise
detection (NNC) of Byers and Raftery (1998). This step is performed
NNclean. The input argument
is passed as
NNclean. Based on
this step a denoised version of
data is obtained. The initial
clustering is then obtained based on the following steps. Note
that these steps are reported in the
code element of the output
list (see Value).
Step 1: perform the model-based hierarchical clustering (MBHC)
proposed in Fraley (1998). This step is performed using
hc. The input argument
modelName is passed
hc. See Details of
hc for more details.
Step 2: if too small clusters (cluster proportions
<cpr.min) are found in the previous step, assign small clusters
to noise and perform MBHC again on the denoised data.
Step 3: if too small clusters are found in the previous step, assign small clusters to noise and perform k-means on the denoised data.
Step 4: if too small clusters are found in the previous step, then a
completely random partition that satisfies
cpr.min is returned.
list with the following components:
An integer indicating the step at which the initial clustering has been found (see Details).
A vector of integers denoting cluster assignments for each
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20:270-281.
Byers, S. and A. E. Raftery (1998). Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
Coretto, P. and C. Hennig (2015). Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. To appear on the Journal of the American Statistical Association. arXiv preprint at arXiv:1406.0808 with (supplement).
1 2 3 4 5 6 7 8 9 10 11 12
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.