mHMM | R Documentation |
mHMM
fits a multilevel (also known as mixed or random effects)
hidden Markov model (HMM) to intense longitudinal data with categorical
or continuous (i.e., normally distributed) observations of multiple
subjects using Bayesian estimation, and creates an object of class mHMM
.
By using a multilevel framework, we allow for heterogeneity in the model
parameters between subjects, while estimating one overall HMM. The function
includes the possibility to add covariates at level 2 (i.e., at the subject
level) and have varying observation lengths over subjects. For a short
description of the package see mHMMbayes. See
vignette("tutorial-mhmm")
for an introduction to multilevel hidden
Markov models and the package, and see vignette("estimation-mhmm")
for
an overview of the used estimation algorithms.
mHMM(
s_data,
data_distr = "categorical",
gen,
xx = NULL,
start_val,
mcmc,
return_path = FALSE,
show_progress = TRUE,
gamma_hyp_prior = NULL,
emiss_hyp_prior = NULL,
gamma_sampler = NULL,
emiss_sampler = NULL
)
s_data |
A matrix containing the observations to be modeled, where the
rows represent the observations over time. In |
data_distr |
String vector with length 1 describing the
observation type of the data. Currently supported are |
gen |
List containing the following elements denoting the general model properties:
|
xx |
An optional list of (level 2) covariates to predict the transition
matrix and/or the emission probabilities. Level 2 covariate(s) means that
there is one observation per subject of each covariate. The first element
in the list If |
start_val |
List containing the start values for the transition
probability matrix gamma and the emission distribution(s). The first
element of the list contains a |
mcmc |
List of Markov chain Monte Carlo (MCMC) arguments, containing the following elements:
|
return_path |
A logical scalar. Should the sampled state sequence
obtained at each iteration and for each subject be returned by the function
( |
show_progress |
A logical scaler. Should the function show a text
progress bar in the |
gamma_hyp_prior |
An optional object of class |
emiss_hyp_prior |
An object of the class |
gamma_sampler |
An optional object of the class |
emiss_sampler |
An optional object of the class |
Covariates specified in xx
can either be dichotomous or continuous
variables. Dichotomous variables have to be coded as 0/1 variables.
Categorical or factor variables can as yet not be used as predictor
covariates. The user can however break up the categorical variable in
multiple dummy variables (i.e., dichotomous variables), which can be used
simultaneously in the analysis. Continuous predictors are automatically
centered. That is, the mean value of the covariate is subtracted from all
values of the covariate such that the new mean equals zero. This is done such
that the presented probabilities in the output (i.e., for the population
transition probability matrix and population emission probabilities)
correspond to the predicted probabilities at the average value of the
covariate(s).
mHMM
returns an object of class mHMM
, which has
print
and summary
methods to see the results.
The object contains always the following components:
PD_subj
A list containing one list per subject with the
elements trans_prob
, cat_emiss
or cont_emiss
in case
of categorical or continuous observations, respectively, and
log_likl
, providing the subject parameter estimates over the
iterations of the MCMC sampler. trans_prob
relates to the transition
probabilities gamma, cat_emiss
to the categorical emission
distribution (emission probabilities), cont_emiss
to the continuous
emission distributions (subsequently the the emission means and the (fixed
over subjects) emission standard deviation), and log_likl
to the log
likelihood over the MCMC iterations. Iterations are contained in the rows,
the parameters in the columns.
gamma_prob_bar
A matrix containing the group level parameter estimates of the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the group level parameter estimates. If covariates were included in the analysis, the group level probabilities represent the predicted probability given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
gamma_int_bar
A matrix containing the group level intercepts of the Multinomial logistic regression modeling the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the group level intercepts.
gamma_cov_bar
A matrix containing the group level regression coefficients of the Multinomial logistic regression predicting the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the group level regression coefficients.
gamma_int_subj
A list containing one matrix per subject denoting the subject level intercepts of the Multinomial logistic regression modeling the transition probabilities over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows, and the columns contain the subject level intercepts.
gamma_naccept
A matrix containing the number of accepted draws at the subject level RW Metropolis step for each set of parameters of the transition probabilities. The subjects are contained in the rows, and the columns contain the sets of parameters.
input
Overview of used input specifications: the distribution
type of the observations data_distr
, the number of
states m
, the number of used dependent variables n_dep
, (in
case of categorical observations) the number of output categories for each
of the dependent variables q_emiss
, the number of iterations
J
and the specified burn in period burn_in
of the hybrid
Metropolis within Gibbs sampler, the number of subjects n_subj
, the
observation length for each subject n_vary
, and the column names of
the dependent variables dep_labels
.
sample_path
A list containing one matrix per subject with the
sampled hidden state sequence over the hybrid Metropolis within Gibbs
sampler. The time points of the dataset are contained in the rows, and the
sampled paths over the iterations are contained in the columns. Only
returned if return_path = TRUE
.
Additionally, in case of categorical observations, the mHMM
return object
contains:
emiss_prob_bar
A list containing one matrix per dependent variable, denoting the group level emission probabilities of each dependent variable over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level emission probabilities. If covariates were included in the analysis, the group level probabilities represent the predicted probability given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
emiss_int_bar
A list containing one matrix per dependent variable, denoting the group level intercepts of each dependent variable of the Multinomial logistic regression modeling the probabilities of the emission distribution over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level intercepts.
emiss_cov_bar
A list containing one matrix per dependent variable, denoting the group level regression coefficients of the Multinomial logistic regression predicting the emission probabilities within each of the dependent variables over the iterations of the hybrid Metropolis within Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level regression coefficients.
emiss_int_subj
A list containing one list per subject denoting the subject level intercepts of each dependent variable of the Multinomial logistic regression modeling the probabilities of the emission distribution over the iterations of the hybrid Metropolis within Gibbs sampler. Each lower level list contains one matrix per dependent variable, in which iterations of the sampler are contained in the rows, and the columns contain the subject level intercepts.
emiss_naccept
A list containing one matrix per dependent variable with the number of accepted draws at the subject level RW Metropolis step for each set of parameters of the emission distribution. The subjects are contained in the rows, and the columns of the matrix contain the sets of parameters.
In case of continuous observations, the mHMM
return object contains:
emiss_mu_bar
A list containing one matrix per dependent variable, denoting the group level means of the Normal emission distribution of each dependent variable over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level emission means. If covariates were included in the analysis, the group level means represent the predicted mean given that the covariate is at the average value for continuous covariates, or given that the covariate equals zero for dichotomous covariates.
emiss_varmu_bar
A list containing one matrix per dependent variable, denoting the variance between the subject level means of the Normal emission distributions. over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level variance in the mean.
emiss_sd_bar
A list containing one matrix per dependent variable, denoting the (fixed over subjects) standard deviation of the Normal emission distributions over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level emission variances.
emiss_cov_bar
A list containing one matrix per dependent variable, denoting the group level regression coefficients predicting the emission means within each of the dependent variables over the iterations of the Gibbs sampler. The iterations of the sampler are contained in the rows of the matrix, and the columns contain the group level regression coefficients.
label_switch
A matrix of m
rows and n_dep
columns containing the percentage of times the group mean of the emission
distribution of state i
was sampled to be a smaller value compared to
the group mean of of the emission distribution of state i-1
. If the
state dependent means of the emission distributions were given in a ranked
order (low to high) to both the start values and hyper-priors, a high
percentage in label_switch
indicates that label switching possibly
poses a problem in the analysis, and further diagnostics (e.g.,
traceplots and posterior distributions) should be inspected.
input
Overview of used input specifications: the number of
states m
, the number of used dependent variables n_dep
, the
number of iterations J
and the specified burn in period
burn_in
of the hybrid Metropolis within Gibbs sampler, the number of
subjects n_subj
, the observation length for each subject
n_vary
, and the column names of the dependent variables
dep_labels
.
rabiner1989mHMMbayes
\insertRefscott2002mHMMbayes
\insertRefaltman2007mHMMbayes
\insertRefrossi2012mHMMbayes
\insertRefzucchini2017mHMMbayes
sim_mHMM
for simulating multilevel hidden Markov data,
vit_mHMM
for obtaining the most likely hidden state sequence
for each subject using the Viterbi algorithm, obtain_gamma
and obtain_emiss
for obtaining the transition or emission
distribution probabilities of a fitted model at the group or subject level,
and plot.mHMM
for plotting the posterior densities of a
fitted model.
###### Example on package (categorical) example data, see ?nonverbal
# specifying general model properties:
m <- 2
n_dep <- 4
q_emiss <- c(3, 2, 3, 2)
# specifying starting values
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
start_EM <- list(matrix(c(0.05, 0.90, 0.05,
0.90, 0.05, 0.05), byrow = TRUE,
nrow = m, ncol = q_emiss[1]), # vocalizing patient
matrix(c(0.1, 0.9,
0.1, 0.9), byrow = TRUE, nrow = m,
ncol = q_emiss[2]), # looking patient
matrix(c(0.90, 0.05, 0.05,
0.05, 0.90, 0.05), byrow = TRUE,
nrow = m, ncol = q_emiss[3]), # vocalizing therapist
matrix(c(0.1, 0.9,
0.1, 0.9), byrow = TRUE, nrow = m,
ncol = q_emiss[4])) # looking therapist
# Run a model without covariate(s):
# Note that for reasons of running time, J is set at a ridiculous low value.
# One would typically use a number of iterations J of at least 1000,
# and a burn_in of 200.
out_2st <- mHMM(s_data = nonverbal,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(start_TM), start_EM),
mcmc = list(J = 11, burn_in = 5))
out_2st
summary(out_2st)
# plot the posterior densities for the transition and emission probabilities
plot(out_2st, component = "gamma", col =c("darkslategray3", "goldenrod"))
# Run a model including a covariate (see ?nonverbal_cov) to predict the
# emission distribution for each of the 4 dependent variables:
n_subj <- 10
xx_emiss <- rep(list(matrix(c(rep(1, n_subj),nonverbal_cov$std_CDI_change),
ncol = 2, nrow = n_subj)), n_dep)
xx <- c(list(matrix(1, ncol = 1, nrow = n_subj)), xx_emiss)
out_2st_c <- mHMM(s_data = nonverbal, xx = xx,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(start_TM), start_EM),
mcmc = list(J = 11, burn_in = 5))
###### Example on categorical simulated data
# Simulate data for 10 subjects with each 100 observations:
n_t <- 100
n <- 10
m <- 2
n_dep <- 1
q_emiss <- 3
gamma <- matrix(c(0.8, 0.2,
0.3, 0.7), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c(0.5, 0.5, 0.0,
0.1, 0.1, 0.8), nrow = m, ncol = q_emiss, byrow = TRUE))
data1 <- sim_mHMM(n_t = n_t, n = n, gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
gamma = gamma, emiss_distr = emiss_distr, var_gamma = .5, var_emiss = .5)
# Specify remaining required analysis input (for the example, we use simulation
# input as starting values):
n_dep <- 1
q_emiss <- 3
# Run the model on the simulated data:
out_2st_sim <- mHMM(s_data = data1$obs,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(gamma), emiss_distr),
mcmc = list(J = 11, burn_in = 5))
###### Example on continuous simulated data
# simulating multivariate continuous data
n_t <- 100
n <- 10
m <- 3
n_dep <- 2
gamma <- matrix(c(0.8, 0.1, 0.1,
0.2, 0.7, 0.1,
0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c( 50, 10,
100, 10,
150, 10), nrow = m, byrow = TRUE),
matrix(c(5, 2,
10, 5,
20, 3), nrow = m, byrow = TRUE))
data_cont <- sim_mHMM(n_t = n_t, n = n, data_distr = 'continuous', gen = list(m = m, n_dep = n_dep),
gamma = gamma, emiss_distr = emiss_distr, var_gamma = .1, var_emiss = c(.5, 0.01))
# Specify hyper-prior for the continuous emission distribution
manual_prior_emiss <- prior_emiss_cont(
gen = list(m = m, n_dep = n_dep),
emiss_mu0 = list(matrix(c(30, 70, 170), nrow = 1),
matrix(c(7, 8, 18), nrow = 1)),
emiss_K0 = list(1, 1),
emiss_V = list(rep(100, m), rep(25, m)),
emiss_nu = list(1, 1),
emiss_a0 = list(rep(1, m), rep(1, m)),
emiss_b0 = list(rep(1, m), rep(1, m)))
# Run the model on the simulated data:
# Note that for reasons of running time, J is set at a ridiculous low value.
# One would typically use a number of iterations J of at least 1000,
# and a burn_in of 200.
out_3st_cont_sim <- mHMM(s_data = data_cont$obs,
data_distr = 'continuous',
gen = list(m = m, n_dep = n_dep),
start_val = c(list(gamma), emiss_distr),
emiss_hyp_prior = manual_prior_emiss,
mcmc = list(J = 11, burn_in = 5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.