sim_mHMM: Simulate data using a multilevel hidden Markov model

Description Usage Arguments Details Value See Also Examples

View source: R/sim_mHMM.R

Description

sim_mHMM simulates data for multiple subjects, for which the data have categorical observations that follow a hidden Markov model (HMM) with an multilevel structure. The multilevel structure implies that each subject is allowed to have its own set of parameters, and that the parameters at the subject level (level 1) are tied together by a population distribution at level 2 for each of the corresponding parameters. The shape of the population distribution for each of the parameters is a normal (i.e., Gaussian) distribution. In addition to (natural and/or unexplained) heterogeneity between subjects, the subjects parameters can also depend on a (set of) covariate(s).

Usage

1
2
3
sim_mHMM(n_t, n, m, q_emiss, gamma, emiss_distr, beta = NULL,
  xx_vec = NULL, var_gamma = 1, var_emiss = 1,
  return_ind_par = FALSE)

Arguments

n_t

The length of the observed sequence to be simulated for each subject. To only simulate subject specific transition probability matrices gamma and emission distributions (and no data), set t to 0.

n

The number of subjects for which data is simulated.

m

The number of hidden states in the HMM used for simulating data.

q_emiss

The number of categories of the simulated observations.

gamma

A matrix with m rows and m columns containing the average population transition probability matrix used for simulating the data. That is, the probability to switch from hidden state i (row i) to hidden state j (column j).

emiss_distr

A matrix with m rows and q_emiss columns containing the average population emission distribution of the (categorical) observations given the hidden states. That is, the probability of observing category k (column k) in state i (row i).

beta

List of two matrices containing the regression parameters to predict gamma and/or emiss_distr in combination with xx_vec using multinomial logistic regression. The first matrix is used to predict the transition probability matrix gamma. The second matrix is used to predict the emission distribution emiss_distr of the dependent variable. In both matrices, one regression parameter is specified for each element in gamma and emiss_distr, with the following exception. The first element in each row of gamma and/or emiss_distr is used as reference category in the multinomial logistic regression. As such, no regression parameters can be specified for these parameters. Hence, the first matrix in the list beta to predict gamma consist of a matrix with the number of rows equal to m and the number of columns equal to m - 1. The second matrix in the list beta to predict emiss_distr consist of a matrix with the number of rows equal to m and the number of columns equal to q_emiss - 1. See details for more information. Note that if beta is specified, xx_vec has to be specified as well. If beta is omitted completely, beta defaults to NULL, resembling no prediction of gamma or emiss_distr using covariates. One of the two elements in the list can also be left empty (i.e., set to NULL) to signify that either the transition probability matrix or a specific emission distribution is not predicted by covariates.

xx_vec

List of two vectors containing the covariate(s) to predict gamma and/or emiss_distr using the regression parameters specified in beta. The covariate used to predict gamma and emiss_distr can either be the same covariate, two different covariates, or a covariate for one element and none for the other. At this point, it is only possible to use one covariate for both gamma and emiss_distr. The first vector of the list xx_vec is used to predict the transition matrix. The second vector of the list xx_vec is used to predict the emission distribution of the dependent variable. For both vectors, the number of observations should be equal to n the number of subjects to be simulated. If xx_vec is omitted completely, xx_vec defaults to NULL, resembling no covariates at all. One of the two elements in the list can also be left empty (i.e., set to NULL) to signify that either the transition probability matrix or the emission distribution is not predicted by covariates.

var_gamma

An integer denoting the variance between subjects in the transition probability matrix. Note that this value corresponds to the variance of the parameters of the multinomial distribution (i.e., the intercepts of the regression equation of the multinomial distribution used to sample the transition probability matrix), see details below. In addition, only one variance value can be specified for the complete transition probability matrix, hence the variance is assumed fixed across all components. The default equals 1, which corresponds to quite some variation between subjects. A less extreme value would be 0.5. If one wants to simulate data from exactly the same HMM for all subjects, var_gamma should be set to 0.

var_emiss

An integer denoting the variance between subjects in the emission distribution. Note that this value corresponds to the variance of the parameters of the multinomial distribution (i.e., the intercepts of the regression equation of the multinomial distribution used to sample the components of the emission distribution), see details below. In addition, only one variance value can be specified for the complete emission distribution, hence the variance is assumed fixed across all components. The default equals 1, which corresponds to quite some variation between subjects. A less extreme value would be 0.5. If one wants to simulate data from exactly the same HMM for all subjects, var_emiss should be set to 0.

return_ind_par

A logical scalar. Should the subject specific transition probability matrix gamma and emission probability matrix emiss_distr be returned by the function (return_ind_par = TRUE) or not (return_ind_par = FALSE). The default equals return_ind_par = FALSE.

Details

In simulating the data, having a multilevel structure means that the parameters for each subject are sampled from the population level distribution of the corresponding parameter. The user specifies the population distribution for each parameter: the average population transition probability matrix and its variance, and the average population emission distribution and its variance. For now, the variance is assumed fixed for all components of the transition probability matrix and for all components of the emission distribution, and the simulated data can only consist of one dependent variable. In addition, at this point only one dependent variable can be simulated. That is, the hidden Markov model is a univariate hidden Markov model.

Note: the subject specific) initial state distributions (i.e., the probability of each of the states at the first time point) needed to simulate the data are obtained from the stationary distributions of the subject specific transition probability matrices gamma.

beta: As the first element in each row of gamma is used as reference category in the multinomial logistic regression, the first matrix in the list beta used to predict transition probability matrix gamma has a number of rows equal to m and the number of columns equal to m - 1. The first element in the first row corresponds to the probability of switching from state one to state two. The second element in the first row corresponds to the probability of switching from state one to state three, and so on. The last element in the first row corresponds to the probability of switching from state one to the last state. The same principle holds for the second matrix in the list beta used to predict the emission distribution emiss_distr: the first element in the first row corresponds to the probability of observing category two in state one. The second element in the first row corresponds to the probability of observing category three is state one, and so on. The last element in the first row corresponds to the probability of observing the last category in state one.

Value

The following components are returned by the function sim_mHMM:

states

A matrix containing the simulated hidden state sequences, with one row per hidden state per subject. The first column indicates subject id number. The second column contains the simulated hidden state sequence, consecutively for all subjects. Hence, the id number is repeated over the rows (with the number of repeats equal to the length of the simulated hidden state sequence T for each subject).

obs

A matrix containing the simulated observed outputs, with one row per simulated observation per subject. The first column indicates subject id number. The second column contains the simulated observation sequence, consecutively for all subjects. Hence, the id number is repeated over rows (with the number of repeats equal to the length of the simulated observation sequence T for each subject).

gamma

A list containing n elements with the simulated subject specific transition probability matrices gamma. Only returned if return_ind_par is set to TRUE.

emiss_distr

A list containing n elements with the simulated subject specific emission probability matrices emiss_distr. Only returned if return_ind_par is set to TRUE.

See Also

mHMM for analyzing multilevel hidden Markov data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# simulating data for 10 subjects with each 100 observations
n_t     <- 100
n       <- 10
m       <- 3
q_emiss <- 4
gamma   <- matrix(c(0.8, 0.1, 0.1,
                  0.2, 0.7, 0.1,
                  0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- matrix(c(0.5, 0.5, 0.0, 0.0,
                        0.1, 0.1, 0.8, 0.0,
                        0.0, 0.0, 0.1, 0.9), nrow = m, ncol = q_emiss, byrow = TRUE)
data1 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma,
                  emiss_distr = emiss_distr, var_gamma = 1, var_emiss = 1)
head(data1$obs)
head(data1$states)

# including a covariate to predict (only) the transition probability matrix gamma
beta      <- rep(list(NULL), 2)
beta[[1]] <- matrix(c(0.5, 1.0,
                     -0.5, 0.5,
                      0.0, 1.0), byrow = TRUE, ncol = 2)
xx_vec      <- rep(list(NULL),2)
xx_vec[[1]] <-  c(rep(0,5), rep(1,5))
data2 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma,
                  emiss_distr = emiss_distr, beta = beta, xx_vec = xx_vec,
                  var_gamma = 1, var_emiss = 1)


# simulating subject specific transition probability matrices and emission distributions only
n_t <- 0
n <- 5
m <- 3
q_emiss <- 4
gamma <- matrix(c(0.8, 0.1, 0.1,
                  0.2, 0.7, 0.1,
                  0.2, 0.2, 0.6), ncol = m, byrow = TRUE)
emiss_distr <- matrix(c(0.5, 0.5, 0.0, 0.0,
                        0.1, 0.1, 0.8, 0.0,
                        0.0, 0.0, 0.1, 0.9), nrow = m, ncol = q_emiss, byrow = TRUE)
data3 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma,
                  emiss_distr = emiss_distr, var_gamma = 1, var_emiss = 1)
data3

data4 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma,
                  emiss_distr = emiss_distr, var_gamma = .5, var_emiss = .5)
data4

Example output

     subj observation
[1,]    1           2
[2,]    1           1
[3,]    1           2
[4,]    1           1
[5,]    1           3
[6,]    1           2
     subj state
[1,]    1     2
[2,]    1     1
[3,]    1     1
[4,]    1     2
[5,]    1     2
[6,]    1     1
$subject_gamma
$subject_gamma[[1]]
       [,1]   [,2]   [,3]
[1,] 0.8695 0.0844 0.0461
[2,] 0.2761 0.4900 0.2340
[3,] 0.1636 0.1795 0.6568

$subject_gamma[[2]]
       [,1]   [,2]   [,3]
[1,] 0.9227 0.0087 0.0686
[2,] 0.2511 0.6077 0.1412
[3,] 0.0856 0.1363 0.7781

$subject_gamma[[3]]
       [,1]   [,2]   [,3]
[1,] 0.9449 0.0394 0.0157
[2,] 0.1670 0.7535 0.0795
[3,] 0.2375 0.0116 0.7509

$subject_gamma[[4]]
       [,1]   [,2]   [,3]
[1,] 0.8291 0.0892 0.0817
[2,] 0.0393 0.9285 0.0322
[3,] 0.2147 0.0717 0.7136

$subject_gamma[[5]]
       [,1]   [,2]   [,3]
[1,] 0.4222 0.0223 0.5556
[2,] 0.2490 0.5273 0.2236
[3,] 0.1191 0.0644 0.8164


$subject_emiss
$subject_emiss[[1]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.6388 0.3612 0.0000 0.0000
[2,] 0.2069 0.1695 0.6235 0.0000
[3,] 0.0000 0.0000 0.1156 0.8844

$subject_emiss[[2]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.2163 0.7836 0.0000 0.0000
[2,] 0.0500 0.1303 0.8197 0.0000
[3,] 0.0000 0.0000 0.0956 0.9044

$subject_emiss[[3]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.8261 0.1737 0.0002 0.0000
[2,] 0.2334 0.3599 0.4067 0.0000
[3,] 0.0000 0.0000 0.0014 0.9986

$subject_emiss[[4]]
       [,1]   [,2]   [,3]  [,4]
[1,] 0.4036 0.5964 0.0000 0.000
[2,] 0.0760 0.1877 0.7363 0.000
[3,] 0.0000 0.0000 0.1079 0.892

$subject_emiss[[5]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.2442 0.7557 0.0000 0.0000
[2,] 0.0150 0.0226 0.9623 0.0000
[3,] 0.0000 0.0000 0.1344 0.8655


$subject_gamma
$subject_gamma[[1]]
       [,1]   [,2]   [,3]
[1,] 0.7075 0.1589 0.1335
[2,] 0.1854 0.7214 0.0932
[3,] 0.3690 0.4242 0.2069

$subject_gamma[[2]]
       [,1]   [,2]   [,3]
[1,] 0.7753 0.0476 0.1771
[2,] 0.2221 0.7368 0.0411
[3,] 0.1831 0.2037 0.6132

$subject_gamma[[3]]
       [,1]   [,2]   [,3]
[1,] 0.6749 0.1851 0.1400
[2,] 0.1996 0.4871 0.3133
[3,] 0.2435 0.1061 0.6504

$subject_gamma[[4]]
       [,1]   [,2]   [,3]
[1,] 0.8774 0.0724 0.0501
[2,] 0.2040 0.7293 0.0667
[3,] 0.3009 0.3039 0.3952

$subject_gamma[[5]]
       [,1]   [,2]   [,3]
[1,] 0.8567 0.0480 0.0953
[2,] 0.1791 0.7044 0.1165
[3,] 0.1648 0.0842 0.7511


$subject_emiss
$subject_emiss[[1]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.4852 0.5148 0.0000 0.0000
[2,] 0.1496 0.3659 0.4845 0.0000
[3,] 0.0000 0.0000 0.1062 0.8938

$subject_emiss[[2]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.5607 0.4393 0.0000 0.0000
[2,] 0.1263 0.1107 0.7630 0.0000
[3,] 0.0000 0.0000 0.0644 0.9356

$subject_emiss[[3]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.4194 0.5806 0.0000 0.0000
[2,] 0.0463 0.0243 0.9295 0.0000
[3,] 0.0000 0.0000 0.1205 0.8795

$subject_emiss[[4]]
       [,1]   [,2]   [,3]   [,4]
[1,] 0.6446 0.3554 0.0000 0.0000
[2,] 0.1250 0.0553 0.8197 0.0000
[3,] 0.0000 0.0000 0.1257 0.8742

$subject_emiss[[5]]
       [,1]  [,2]   [,3]   [,4]
[1,] 0.4600 0.540 0.0000 0.0000
[2,] 0.1748 0.203 0.6222 0.0000
[3,] 0.0000 0.000 0.0892 0.9107

mHMMbayes documentation built on Oct. 30, 2019, 5:05 p.m.