identifyMixture: Solve label switching and identify mixture.
In telescope: Bayesian Mixtures with an Unknown Number of Components

identifyMixture

R Documentation

Solve label switching and identify mixture.

Clustering of the draws in the point process representation (PPR) using k-means clustering.

identifyMixture(Func, Mu, Eta, S, centers)

`Func`	A numeric array of dimension `M \times d \times K`; data for clustering in the PPR.
`Mu`	A numeric array of dimension `M \times r \times K`; draws of cluster means.
`Eta`	A numeric array of dimension `M \times K`; draws of cluster sizes.
`S`	A numeric matrix of dimension `M \times N`; draws of cluster assignments.
`centers`	An integer or a numeric matrix of dimension `K \times d`; used to initialize `stats::kmeans()`.

The following steps are implemented:

A functional of the draws of the component-specific parameters (Func) is passed to the function. The functionals of each component and iteration are stacked on top of each other in order to obtain a matrix where each row corresponds to the functional of one component.
The functionals are clustered into K_+ clusters using k-means clustering. For each functional a group label is obtained.
The obtained labels of the functionals are used to construct a classification for each MCMC iteration. Those classifications which are a permutation of (1,\ldots,K_+) are used to reorder the Mu and Eta draws and the assignment matrix S. This results in an identified mixture model.
Note that only iterations resulting in permutations are used for parameter estimation and deriving the final partition. Those MCMC iterations where the obtained classifications of the functionals are not a permutation of (1,\ldots,K_+) are discarded as no unique assignment of functionals to components can be made. If the non-permutation rate, i.e. the proportion of MCMC iterations where the obtained classifications of the functionals are not a permutation, is high, this is an indication of a poor clustering solution, as the functionals are not clearly separated.

A named list containing:

"S": reordered assignments.
"Mu": reordered Mu matrix.
"Eta": reordered weights.
"non_perm_rate": proportion of draws where the clustering did not result in a permutation and hence no relabeling could be performed; this is the proportion of draws discarded.