estimate_mnhmm: Estimate a Mixture Non-homogeneous Hidden Markov Model
In seqHMM: Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series

estimate_mnhmm

R Documentation

Estimate a Mixture Non-homogeneous Hidden Markov Model

Description

Function estimate_mnhmm estimates a mixture version of non-homogeneous hidden Markov model (MNHMM) where initial, transition, emission, and mixture probabilities can depend on covariates. See estimate_nhmm() for further details.

Usage

estimate_mnhmm(
  n_states,
  n_clusters,
  emission_formula,
  initial_formula = ~1,
  transition_formula = ~1,
  cluster_formula = ~1,
  data,
  time,
  id,
  lambda = 0,
  prior_obs = "fixed",
  state_names = NULL,
  cluster_names = NULL,
  inits = "random",
  init_sd = 2,
  restarts = 0L,
  method = "EM-DNM",
  bound = Inf,
  control_restart = list(),
  control_mstep = list(),
  ...
)

Arguments

`n_states`	An integer > 1 defining the number of hidden states.
`n_clusters`	A positive integer defining the number of clusters (mixtures).
`emission_formula`	of class `formula()` for the state emission probabilities, or a list of such formulas in case of multiple response variables. The left-hand side of formulas define the responses. For multiple responses having same formula, you can use a form `c(y1, y2) ~ x`, where `y1` and `y2` are the response variables.
`initial_formula`	of class `formula()` for the initial state probabilities. Left-hand side of the formula should be empty.
`transition_formula`	of class `formula()` for the state transition probabilities. Left-hand side of the formula should be empty.
`cluster_formula`	of class `formula()` for the mixture probabilities.
`data`	A data frame containing the variables used in the model formulas.
`time`	Name of the time index variable in `data`.
`id`	Name of the id variable in `data` identifying different sequences.
`lambda`	Penalization factor `lambda` for penalized log-likelihood, where the penalization is `0.5 * lambda * sum(eta^2)`. Note that with `method = "L-BFGS"` both objective function (log-likelihood) and the penalization term is scaled with number of non-missing observations. Default is `0`, but small values such as `1e-4` can help to ensure numerical stability of L-BFGS by avoiding extreme probabilities. See also argument `bound` for hard constraints.
`prior_obs`	Either `"fixed"` or a list of vectors given the prior distributions for the responses at time "zero". See details.
`state_names`	A vector of optional labels for the hidden states. If this is `NULL` (the default), numbered states are used.
`cluster_names`	A vector of optional labels for the clusters. If this is `NULL` (the default), numbered clusters are used.
`inits`	If `inits = "random"` (default), random initial values are used. Otherwise `inits` should be list of initial values. If coefficients are given using list components `eta_pi`, `eta_A`, `eta_B`, and `eta_omega`, these are used as is, alternatively initial values can be given in terms of the initial state, transition, emission, and mixture probabilities using list components `initial_probs`, `emission_probs`, `transition_probs`, and `cluster_probs`. These can also be mixed, i.e. you can give only `initial_probs` and `eta_A`.
`init_sd`	Standard deviation of the normal distribution used to generate random initial values. Default is `2`. If you want to fix the initial values of the regression coefficients to zero, use `init_sd = 0`.
`restarts`	Number of times to run optimization using random starting values (in addition to the final run). Default is 0.
`method`	Optimization method used. Option `"EM"` uses EM algorithm with L-BFGS in the M-step. Option `"DNM"` uses direct maximization of the log-likelihood, by default using L-BFGS. Option `"EM-DNM"` (the default) runs first a maximum of 10 iterations of EM and then switches to L-BFGS (but other algorithms of NLopt can be used).
`bound`	Positive value defining the hard lower and upper bounds for the working parameters `\eta`, which are used to avoid extreme probabilities and corresponding numerical issues especially in the M-step of EM algorithm. Default is `⁠Inf´, i.e., no bounds. Note that he bounds are not enforced for M-step in intercept-only case with ⁠`lambda = 0'.
`control_restart`	Controls for restart steps, see details.
`control_mstep`	Controls for M-step of EM algorithm, see details.
`...`	Additional arguments to `nloptr::nloptr()` and EM algorithm. See details.

Value

Object of class mnhmm.

Examples

data("mvad", package = "TraMineR")

d <- reshape(mvad, direction = "long", varying = list(15:86), 
  v.names = "activity")

## Not run: 
set.seed(1)
fit <- estimate_mnhmm(n_states = 3, n_clusters = 2,
  data = d, time = "time", id = "id", 
  cluster_formula = ~ male + catholic + gcse5eq + Grammar + 
    funemp + fmpr + livboth + Belfast +
  N.Eastern + Southern + S.Eastern + Western,
  emission_formula = activity ~ male + catholic + gcse5eq,
  initial_formula = ~ 1, 
  transition_formula = ~ male + gcse5eq
  )

## End(Not run)

seqHMM documentation built on June 8, 2025, 10:16 a.m.