View source: R/flxregmultinom.R
FLXMCregmultinom | R Documentation |
This model driver can be used to cluster data using a multinomial distribution.
FLXMCregmultinom(formula = . ~ ., r = NULL, alpha = 0)
formula |
A formula which is interpreted relative to the
formula specified in the call to |
r |
Number of different categories. Values are assumed to be
integers in |
alpha |
A non-negative scalar acting as regularization
parameter. Can be regarded as adding |
Using a regularization parameter alpha
greater than zero
acts as adding alpha
observations conforming to the population
mean to each component. This can be used to avoid degenerate
solutions. It also has the effect
that clusters become more similar to each other the larger
alpha
is chosen. For small values it is mostly negligible however.
For regularization we compute the MAP estimates for the multinomial distribution using the Dirichlet distribution as prior, which is the conjugate prior. The parameters of this prior are selected to correspond to the marginal distribution of the variable across all observations.
An object of class "FLXC"
.
Galindo Garre, F, Vermunt, JK (2006). Avoiding Boundary Estimates in Latent Class Analysis by Bayesian Posterior Mode Estimation Behaviormetrika, 33, 43-59. - Ernst, D, Ortega Menjivar, L, Scharl, T, GrĂ¼n, B (2025). Ordinal Clustering with the flex-Scheme. Austrian Journal of Statistics. Submitted manuscript.
library("flexmix")
library("flexord")
library("flexclust")
set.seed(0xdeaf)
# Sample data
k <- 4 # nr of clusters
nvar <- 10 # nr of variables
r <- sample(2:7, size=nvar, replace=TRUE) # nr of categories
N <- 100 # obs. per cluster
# random probabilities per component
probs <- lapply(seq_len(k), \(ki) runif(nvar, 0.01, 0.99))
# sample data by drawing from a binomial distribution with size = r - 1
# values are expect values to lie inside 1:r hence we add +1.
dat <- lapply(probs, \(p) {
mapply(\(p_i, r_i) {
rbinom(N, r_i, p_i) + 1
}, p, r-1, SIMPLIFY=FALSE) |> do.call(cbind, args=_)
}) |> do.call(rbind, args=_)
true_clusters <- rep(1:4, rep(N, k))
# Cluster without regularization
m1 <- stepFlexmix(dat~1, model=FLXMCregmultinom(r=r, alpha=0), k=k)
# Cluster with regularization
m2 <- stepFlexmix(dat~1, model=FLXMCregmultinom(r=r, alpha=1), k=k)
# Both models are mostly able to reconstruct the true clusters (ARI ~ 0.95)
# (it's a very easy clustering problem)
# Small values for the regularization don't seem to affect the ARI (much)
randIndex(clusters(m1), true_clusters)
randIndex(clusters(m2), true_clusters)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.