Description Usage Arguments Details Value References Author(s) See Also Examples
Computes the initial cluster assignment based on a combination of nearest neighbor based noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.
1 2 |
data |
A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
variables and |
G |
An integer specifying the number of clusters. |
k |
An integer specifying the number of considered nearest neighbors per point used for the denoising step (see Details). |
knnd.trim |
A number in [0,1) which defines the proportion of points
initialized as noise. Tipically |
modelName |
A character string indicating the covariance model to be used. Possible models are: |
The initialization is based on Coretto and Hennig (2017). First, wwo
steps are performed:
Step 1 (denoising step): for each data point compute its
k
th-
nearest neighbors
distance (k-
NND). All points with k-
NND larger
than the (1-knnd.trim
)-
quantile of the k-
NND
are initialized as noise. Intepretaion of
k
is that: (k-1)
, but not k
, points close
together may still be interpreted as noise or outliers
Step 2 (clustering step): perform the model-based hierarchical
clustering (MBHC) proposed in Fraley (1998). This step is performed using
hc
. The input argument modelName
is passed
to hc
. See Details of
hc
for more details.
If the previous Step 2 fails to provide G
clusters each
containing at least 2 distinct data points, it is replaced with
classical hirararchical clustering implemented in
hclust
. Finally, if
hclust
fails to provide a valid partition, up
to ten random partitions are tried.
An integer vector specifying the initial cluster
assignment with 0
denoting noise/outliers.
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20:270-281.
P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html
Pietro Coretto pcoretto@unisa.it https://pietro-coretto.github.io
hc
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.