FLXMCregmultinom: FlexMix Driver for Regularized Multinomial Mixtures
In flexord: Flexible Clustering of Ordinal and Mixed-with-Ordinal Data

FLXMCregmultinom

R Documentation

FlexMix Driver for Regularized Multinomial Mixtures

Description

This model driver can be used to cluster data using a multinomial distribution.

Usage

FLXMCregmultinom(formula = . ~ ., r = NULL, alpha = 0)

Arguments

`formula`	A formula which is interpreted relative to the formula specified in the call to `flexmix::flexmix()` using `stats::update.formula()`. Only the left-hand side (response) of the formula is used. Default is to use the original model formula specified in `flexmix::flexmix()`.
`r`	Number of different categories. Values are assumed to be integers in `1:r`. Default `NULL` implies that the number of different categories is inferred columnwise by the maximum value observed.
`alpha`	A non-negative scalar acting as regularization parameter. Can be regarded as adding `alpha` observations equal to the population mean to each component.

Details

Using a regularization parameter alpha greater than zero acts as adding alpha observations conforming to the population mean to each component. This can be used to avoid degenerate solutions. It also has the effect that clusters become more similar to each other the larger alpha is chosen. For small values it is mostly negligible however.

For regularization we compute the MAP estimates for the multinomial distribution using the Dirichlet distribution as prior, which is the conjugate prior. The parameters of this prior are selected to correspond to the marginal distribution of the variable across all observations.

Value

An object of class "FLXC".

References

Galindo Garre, F, Vermunt, JK (2006). Avoiding Boundary Estimates in Latent Class Analysis by Bayesian Posterior Mode Estimation Behaviormetrika, 33, 43-59. - Ernst, D, Ortega Menjivar, L, Scharl, T, Grün, B (2025). Ordinal Clustering with the flex-Scheme. Austrian Journal of Statistics. Submitted manuscript.

Examples

library("flexmix")
library("flexord")
library("flexclust")


set.seed(0xdeaf)

# Sample data
k <- 4     # nr of clusters
nvar <- 10  # nr of variables
r <- sample(2:7, size=nvar, replace=TRUE)  # nr of categories
N <- 100   # obs. per cluster


# random probabilities per component
probs <- lapply(seq_len(k), \(ki) runif(nvar, 0.01, 0.99))

# sample data by drawing from a binomial distribution with size = r - 1
# values are expect values to lie inside 1:r hence we add +1.
dat <- lapply(probs, \(p) {
    mapply(\(p_i, r_i) {
        rbinom(N, r_i, p_i) + 1
    }, p, r-1, SIMPLIFY=FALSE) |> do.call(cbind, args=_)
}) |> do.call(rbind, args=_)

true_clusters <- rep(1:4, rep(N, k))

# Cluster without regularization
m1 <- stepFlexmix(dat~1, model=FLXMCregmultinom(r=r, alpha=0), k=k)

# Cluster with regularization
m2 <- stepFlexmix(dat~1, model=FLXMCregmultinom(r=r, alpha=1), k=k)

# Both models are mostly able to reconstruct the true clusters (ARI ~ 0.95)
# (it's a very easy clustering problem)
# Small values for the regularization don't seem to affect the ARI (much)
randIndex(clusters(m1), true_clusters)
randIndex(clusters(m2), true_clusters)

flexord documentation built on April 3, 2025, 8:53 p.m.