View source: R/estimate_nhmm.R
estimate_nhmm | R Documentation |
Function estimate_nhmm
estimates a non-homogeneous hidden Markov model
(NHMM) where initial, transition, and emission probabilities can depend on
covariates. Transition and emission probabilities can also depend on past
responses, in which case the model is called feedback-augmented NHMM
(FAN-HMM) (Helske, 2025).
estimate_nhmm(
n_states,
emission_formula,
initial_formula = ~1,
transition_formula = ~1,
data,
time,
id,
lambda = 0,
prior_obs = "fixed",
state_names = NULL,
inits = "random",
init_sd = 2,
restarts = 0L,
method = "EM-DNM",
bound = Inf,
control_restart = list(),
control_mstep = list(),
...
)
n_states |
An integer > 1 defining the number of hidden states. |
emission_formula |
of class |
initial_formula |
of class |
transition_formula |
of class |
data |
A data frame containing the variables used in the model formulas. |
time |
Name of the time index variable in |
id |
Name of the id variable in |
lambda |
Penalization factor |
prior_obs |
Either |
state_names |
A vector of optional labels for the hidden states. If this
is |
inits |
If |
init_sd |
Standard deviation of the normal distribution used to generate
random initial values. Default is |
restarts |
Number of times to run optimization using random starting values (in addition to the final run). Default is 0. |
method |
Optimization method used. Option |
bound |
Positive value defining the hard lower and upper bounds for the
working parameters |
control_restart |
Controls for restart steps, see details. |
control_mstep |
Controls for M-step of EM algorithm, see details. |
... |
Additional arguments to |
In case of FAN-HMM with autoregressive dependency on the observational level,
(i.e. response y_t
depend on y_{t-1}
), the emission
probabilities at the first time point need special attention. By default,
the model is initialized with fixed values for the first time point
(prior_obs = "fixed"
), meaning that if the input data consists of
time points t=1, 2, \ldots
, then the model is defined from t=2
onwards and the data on t=1
is used only for defining the emission
probabilities at t=2
. Note that in this case also the initial state
probabilities correspond to t=2
.
Alternatively, you can define prior_obs
as a list of vectors, where
the number of vectors is equal to the number of responses, and each vector
gives the prior distribution for the response at t=0
. For example,
if you have response variables y
and x
, where y
has 3 categories and
x
2 categories, you can define
prior_obs = list(y = c(0.5, 0.3, 0.2), x = c(0.7, 0.3))
.
These distributions are then used to marginalize out y_0
and
x_0
in the relevant emission probabilities.
By default, the model parameters are estimated using EM-DNM algorithm
which first runs some iterations (100 by default) of EM algorithm, and then
switches to L-BFGS. Other options include any numerical optimization
algorithm of nloptr::nloptr()
, or plain EM algorithm where the
M-step uses L-BFGS (provided by the NLopt library).
With multiple runs of optimization (by using the restarts
argument), it is
possible to parallelize these runs using the future
package, e.g., by
calling future::plan(multisession, workers = 2)
before estimate_nhmm()
.
See future::plan()
for details. This is compatible with progressr
package, so you can use progressr::with_progress()
to track the progress
of these multiple runs.
During the estimation, the log-likelihood is scaled by the number of
non-missing observations (nobs(model)
), and the the covariate data is
standardardized before optimization.
By default, the convergence is claimed when the relative
change of the objective function is less than 1e-12
, the absolute change
is less than 1e-8
or the relative or absolute change of the working
parameters eta is less than 1e-6
. These can be changed
by passing arguments ftol_rel
, ftol_abs
, xtol_rel
, and xtol_abs
via ...
. These, as well as, maxeval
(maximum number of iterations,
1e4 by default), and print_level
(default is 0
, no console output,
larger values are more verbose), are used by the chosen main optimization
method. The number of initial EM iterations in EM-DNM
can be set using
argument maxeval_em_dnm
(default is 100), and algorithm for direct
numerical optimization can be defined using argument algorithm
(see nloptr::nloptr()
for possible options).
For controlling these stopping criteria for the multistart phase, argument
control_restart
takes a list such as list(ftol_rel = 0.01, print_level = 1)
.
Default are as in the case of main optimization (which is always run once
after the restarts, using best solution from restarts as initial value)
Additionally, same options can be defined separately for the M-step of EM
algorithm via list control_mstep
. For control_mstep
, the
default values are ftol_rel = 1e-10
, and maxeval = 1000
, and otherwise
identical to previous defaults above.
Object of class nhmm
or fanhmm
.
Helske, J (2025). Feedback-augmented Non-homogeneous Hidden Markov Models for Longitudinal Causal Inference. arXiv preprint. doi:10.48550/arXiv.2503.16014.
Johnson, SG. The NLopt nonlinear-optimization package, http://github.com/stevengj/nlopt.
data("mvad", package = "TraMineR")
d <- reshape(mvad, direction = "long", varying = list(15:86),
v.names = "activity")
## Not run:
set.seed(1)
fit <- estimate_nhmm(n_states = 3,
data = d, time = "time", id = "id",
emission_formula = activity ~ gcse5eq, initial_formula = ~ 1,
transition_formula = ~ male + gcse5eq,
method = "DNM", maxeval = 2 # very small number of iterations for CRAN
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.