cluster_mmm: Cluster sequences using Mixed Markov Models
In Nestimate: Network Estimation, Bootstrap, and Higher-Order Analysis

cluster_mmm

R Documentation

Cluster sequences using Mixed Markov Models

Description

Fits a mixture of Markov chains to sequence data and returns a netobject_group containing per-cluster transition networks. This is the MMM equivalent of cluster_network (which uses distance-based clustering); both functions share the cluster_by = ... surface argument so the call shape stays uniform across clustering families.

Usage

cluster_mmm(
  data,
  k = 2L,
  n_starts = 50L,
  max_iter = 200L,
  tol = 1e-06,
  smooth = 0.01,
  seed = NULL,
  covariates = NULL,
  covariate_effect = c("em", "posthoc"),
  estimator = c("auto", "firth", "multinom", "chisq"),
  cluster_by = "mmm",
  ...
)

Arguments

`data`	A data.frame (wide format), `netobject`, or `tna` model. For tna objects, extracts the stored data.
`k`	Integer. Whole finite number of mixture components, >= 2. Default: 2.
`n_starts`	Integer. Positive whole finite number of random restarts. Default: 50.
`max_iter`	Integer. Positive whole finite maximum EM iterations per start. Default: 200.
`tol`	Numeric. Finite positive convergence tolerance. Default: 1e-6.
`smooth`	Numeric. Finite non-negative Laplace smoothing constant. Default: 0.01.
`seed`	Integer or NULL. Random seed.
`covariates`	Optional. Covariates integrated into the EM algorithm to model covariate-dependent mixing proportions. Accepts a string, character vector, formula, or data.frame (same forms as `build_clusters`). For `netobject` or `cograph_network` input, names are resolved against `$metadata` first, so a typical call is `build_mmm(net, k = 3, covariates = "session_label")`. Unlike the post-hoc analysis in `build_clusters()`, these covariates directly influence cluster membership during EM estimation (see `covariate_effect`).
`covariate_effect`	How `covariates` enter the model. `"em"` (default) folds them into the EM as covariate-dependent mixing proportions, so they shape the cluster fit itself (and rows with missing covariates are dropped before fitting). `"posthoc"` fits a plain mixture on every sequence and uses the covariates only for the after-fit multinomial logit, so covariate values — and their missingness — never change which clusters are found. Ignored when `covariates` is `NULL`.
`estimator`	Multinomial fitter for the post-hoc covariate analysis (does not affect EM): `"auto"` (default) inspects the cluster x covariate cross-tab and falls back to `"firth"` only when any cell has fewer than 5 observations (separation risk), otherwise the much faster `"multinom"`; `"firth"` forces Firth's penalised likelihood via `brglm2::brmultinom` (finite under separation); `"multinom"` forces `nnet::multinom` (warns about separation risk); `"chisq"` runs descriptive tests (no logit). See `build_clusters` for full details.
`cluster_by`	Character. Accepted only as `"mmm"` (the default). Present so `cluster_mmm()` and `cluster_network()` share the same call shape; any other value raises an error pointing at `cluster_network`.
`...`	Unsupported. Supplying unused arguments raises an error.

Details

For the full net_mmm object with posterior probabilities, model fit statistics, and S3 methods, use build_mmm instead.

Value

A netobject_group (list of netobjects, one per cluster). MMM-specific information is stored in attr(, "clustering") (class "net_mmm_clustering"):

assignments: Integer vector of cluster assignments.
k: Number of clusters.
posterior: N x k matrix of posterior probabilities.
mixing: Mixing proportions.
quality: List with AvePP, entropy, classification error.
BIC, AIC, ICL: Model fit statistics.
data: The full N-row sequence frame, matching $assignments – so sequence_plot and distribution_plot can recover both.

Examples

seqs <- data.frame(V1 = sample(c("A","B","C"), 30, TRUE),
                   V2 = sample(c("A","B","C"), 30, TRUE))
grp <- cluster_mmm(seqs, k = 2, n_starts = 1, max_iter = 10, seed = 1)
grp[[1]]$weights
attr(grp, "clustering")$assignments

# Visualise with sequence_plot
seqs <- data.frame(
  V1 = sample(LETTERS[1:3], 40, TRUE),
  V2 = sample(LETTERS[1:3], 40, TRUE),
  V3 = sample(LETTERS[1:3], 40, TRUE)
)
grp <- cluster_mmm(seqs, k = 2)
sequence_plot(grp, type = "index")

Nestimate documentation built on July 11, 2026, 1:09 a.m.