sim_mHMM | R Documentation |
sim_mHMM
simulates data for multiple subjects, for which the data have
either categorical or continuous (i.e., normally distributed)
observations that follow a hidden Markov model (HMM) with a
multilevel structure. The multilevel structure implies that each subject is
allowed to have its own set of parameters, and that the parameters at the
subject level (level 1) are tied together by a population distribution at
level 2 for each of the corresponding parameters. The shape of the population
distribution for each of the parameters is a normal distribution. In addition
to (natural and/or unexplained) heterogeneity between subjects, the subjects
parameters can also depend on a covariate.
sim_mHMM(
n_t,
n,
data_distr = "categorical",
gen,
gamma,
emiss_distr,
start_state = NULL,
xx_vec = NULL,
beta = NULL,
var_gamma = 0.1,
var_emiss = NULL,
return_ind_par = FALSE,
m,
n_dep,
q_emiss
)
n_t |
Numeric vector with length 1 denoting the length of the observed
sequence to be simulated for each subject. To only simulate subject
specific transition probability matrices gamma and emission distributions
(and no data), set |
n |
Numeric vector with length 1 denoting the number of subjects for which data is simulated. |
data_distr |
String vector with length 1 describing the
observation type of the data. Currently supported are |
gen |
List containing the following elements denoting the general model properties:
|
gamma |
A matrix with |
emiss_distr |
A list with |
start_state |
Optional numeric vector with length 1 denoting in which state the simulated state sequence should start. If left unspecified, the simulated state for time point 1 is sampled from the initial state distribution (which is derived from the transition probability matrix gamma). |
xx_vec |
List of 1 + |
beta |
List of 1 + Note that if |
var_gamma |
A numeric vector with length 1 denoting the amount of variance between subjects in
the transition probability matrix. Note that this value corresponds to the
variance of the parameters of the Multinomial distribution (i.e., the
intercepts of the regression equation of the Multinomial distribution used
to sample the transition probability matrix), see details below. In
addition, only one variance value can be specified for the complete
transition probability matrix, hence the variance is assumed fixed across
all components. The default equals 0.1, which corresponds to little
variation between subjects. If one wants to simulate data from exactly the
same HMM for all subjects, var_gamma should be set to 0. Note that if data
for only 1 subject is simulated (i.e., n = 1), |
var_emiss |
A numeric vector with length |
return_ind_par |
A logical scalar. Should the subject specific
transition probability matrix |
m |
The argument |
n_dep |
The argument |
q_emiss |
The argument |
In simulating the data, having a multilevel structure means that the parameters for each subject are sampled from the population level distribution of the corresponding parameter. The user specifies the population distribution for each parameter: the average population transition probability matrix and its variance, and the average population emission distribution and its variance. For now, the variance of the mean population parameters is assumed fixed for all components of the transition probability matrix and for all components of the emission distribution.
One can simulate multivariate data. That is, the hidden states depend on more than 1 observed variable simultaneously. The distributions of multiple dependent variables for multivariate data are assumed to be independent, and all distributions for one dataset have to be of the same type (either categorical or continuous).
Note: the subject specific) initial state distributions (i.e., the probability of each of the states at the first time point) needed to simulate the data are obtained from the stationary distributions of the subject specific transition probability matrices gamma.
beta
: As the first element in each row of gamma
is used as
reference category in the Multinomial logistic regression, the first matrix
in the list beta
used to predict transition probability matrix
gamma
has a number of rows equal to m
and the number of columns
equal to m
- 1. The first element in the first row corresponds to the
probability of switching from state one to state two. The second element in
the first row corresponds to the probability of switching from state one to
state three, and so on. The last element in the first row corresponds to the
probability of switching from state one to the last state. The same principle
holds for the second matrix in the list beta
used to predict
categorical emission distribution(s) emiss_distr
: the first element in
the first row corresponds to the probability of observing category two in
state one. The second element in the first row corresponds to the probability
of observing category three is state one, and so on. The last element in the
first row corresponds to the probability of observing the last category in
state one.
The following components are returned by the function sim_mHMM
:
states
A matrix containing the simulated hidden state
sequences, with one row per hidden state per subject. The first column
indicates subject id number. The second column contains the simulated
hidden state sequence, consecutively for all subjects. Hence, the id number
is repeated over the rows (with the number of repeats equal to the length
of the simulated hidden state sequence T
for each subject).
obs
A matrix containing the simulated observed outputs, with
one row per simulated observation per subject. The first column indicates
subject id number. The second column contains the simulated observation
sequence, consecutively for all subjects. Hence, the id number is repeated
over rows (with the number of repeats equal to the length of the simulated
observation sequence T
for each subject).
gamma
A list containing n
elements with the simulated
subject specific transition probability matrices gamma
. Only
returned if return_ind_par
is set to TRUE
.
emiss_distr
A list containing n
elements with the
simulated subject specific emission probability matrices
emiss_distr
. Only returned if return_ind_par
is set to
TRUE
.
mHMM
for analyzing multilevel hidden Markov data.
# simulating data for 10 subjects with each 100 categorical observations
n_t <- 100
n <- 10
m <- 3
n_dep <- 1
q_emiss <- 4
gamma <- matrix(c(0.8, 0.1, 0.1,
0.2, 0.7, 0.1,
0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c(0.5, 0.5, 0.0, 0.0,
0.1, 0.1, 0.8, 0.0,
0.0, 0.0, 0.1, 0.9), nrow = m, ncol = q_emiss, byrow = TRUE))
data1 <- sim_mHMM(n_t = n_t, n = n, gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
gamma = gamma, emiss_distr = emiss_distr, var_gamma = 1, var_emiss = 1)
head(data1$obs)
head(data1$states)
# including a covariate to predict (only) the transition probability matrix gamma
beta <- rep(list(NULL), 2)
beta[[1]] <- matrix(c(0.5, 1.0,
-0.5, 0.5,
0.0, 1.0), byrow = TRUE, ncol = 2)
xx_vec <- rep(list(NULL),2)
xx_vec[[1]] <- c(rep(0,5), rep(1,5))
data2 <- sim_mHMM(n_t = n_t, n = n, gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
gamma = gamma, emiss_distr = emiss_distr, beta = beta, xx_vec = xx_vec,
var_gamma = 1, var_emiss = 1)
# simulating subject specific transition probability matrices and emission distributions only
n_t <- 0
n <- 5
m <- 3
n_dep <- 1
q_emiss <- 4
gamma <- matrix(c(0.8, 0.1, 0.1,
0.2, 0.7, 0.1,
0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c(0.5, 0.5, 0.0, 0.0,
0.1, 0.1, 0.8, 0.0,
0.0, 0.0, 0.1, 0.9), nrow = m, ncol = q_emiss, byrow = TRUE))
data3 <- sim_mHMM(n_t = n_t, n = n, gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
gamma = gamma, emiss_distr = emiss_distr, var_gamma = 1, var_emiss = 1)
data3
data4 <- sim_mHMM(n_t = n_t, n = n, gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
gamma = gamma, emiss_distr = emiss_distr, var_gamma = .5, var_emiss = .5)
data4
# simulating multivariate continuous data
n_t <- 100
n <- 10
m <- 3
n_dep <- 2
gamma <- matrix(c(0.8, 0.1, 0.1,
0.2, 0.7, 0.1,
0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- list(matrix(c( 50, 10,
100, 10,
150, 10), nrow = m, byrow = TRUE),
matrix(c(5, 2,
10, 5,
20, 3), nrow = m, byrow = TRUE))
data_cont <- sim_mHMM(n_t = n_t, n = n, data_distr = 'continuous',
gen = list(m = m, n_dep = n_dep),
gamma = gamma, emiss_distr = emiss_distr,
var_gamma = .5, var_emiss = c(.5, 0.01))
head(data_cont$states)
head(data_cont$obs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.