# sim_mHMM: Simulate data using a multilevel hidden Markov model In mHMMbayes: Multilevel Hidden Markov Models Using Bayesian Estimation

## Description

`sim_mHMM` simulates data for multiple subjects, for which the data have categorical observations that follow a hidden Markov model (HMM) with an multilevel structure. The multilevel structure implies that each subject is allowed to have its own set of parameters, and that the parameters at the subject level (level 1) are tied together by a population distribution at level 2 for each of the corresponding parameters. The shape of the population distribution for each of the parameters is a normal (i.e., Gaussian) distribution. In addition to (natural and/or unexplained) heterogeneity between subjects, the subjects parameters can also depend on a (set of) covariate(s).

## Usage

 ```1 2 3``` ```sim_mHMM(n_t, n, m, q_emiss, gamma, emiss_distr, beta = NULL, xx_vec = NULL, var_gamma = 1, var_emiss = 1, return_ind_par = FALSE) ```

## Arguments

 `n_t` The length of the observed sequence to be simulated for each subject. To only simulate subject specific transition probability matrices gamma and emission distributions (and no data), set `t` to 0. `n` The number of subjects for which data is simulated. `m` The number of hidden states in the HMM used for simulating data. `q_emiss` The number of categories of the simulated observations. `gamma` A matrix with `m` rows and `m` columns containing the average population transition probability matrix used for simulating the data. That is, the probability to switch from hidden state i (row i) to hidden state j (column j). `emiss_distr` A matrix with `m` rows and `q_emiss` columns containing the average population emission distribution of the (categorical) observations given the hidden states. That is, the probability of observing category k (column k) in state i (row i). `beta` List of two matrices containing the regression parameters to predict `gamma` and/or `emiss_distr` in combination with `xx_vec` using multinomial logistic regression. The first matrix is used to predict the transition probability matrix `gamma`. The second matrix is used to predict the emission distribution `emiss_distr` of the dependent variable. In both matrices, one regression parameter is specified for each element in `gamma` and `emiss_distr`, with the following exception. The first element in each row of `gamma` and/or `emiss_distr` is used as reference category in the multinomial logistic regression. As such, no regression parameters can be specified for these parameters. Hence, the first matrix in the list `beta` to predict `gamma` consist of a matrix with the number of rows equal to `m` and the number of columns equal to `m` - 1. The second matrix in the list `beta` to predict `emiss_distr` consist of a matrix with the number of rows equal to `m` and the number of columns equal to `q_emiss` - 1. See details for more information. Note that if `beta` is specified, `xx_vec` has to be specified as well. If `beta` is omitted completely, `beta` defaults to NULL, resembling no prediction of `gamma` or `emiss_distr` using covariates. One of the two elements in the list can also be left empty (i.e., set to `NULL`) to signify that either the transition probability matrix or a specific emission distribution is not predicted by covariates. `xx_vec` List of two vectors containing the covariate(s) to predict `gamma` and/or `emiss_distr` using the regression parameters specified in `beta`. The covariate used to predict `gamma` and `emiss_distr` can either be the same covariate, two different covariates, or a covariate for one element and none for the other. At this point, it is only possible to use one covariate for both `gamma` and `emiss_distr`. The first vector of the list `xx_vec` is used to predict the transition matrix. The second vector of the list `xx_vec` is used to predict the emission distribution of the dependent variable. For both vectors, the number of observations should be equal to `n` the number of subjects to be simulated. If `xx_vec` is omitted completely, `xx_vec` defaults to NULL, resembling no covariates at all. One of the two elements in the list can also be left empty (i.e., set to `NULL`) to signify that either the transition probability matrix or the emission distribution is not predicted by covariates. `var_gamma` An integer denoting the variance between subjects in the transition probability matrix. Note that this value corresponds to the variance of the parameters of the multinomial distribution (i.e., the intercepts of the regression equation of the multinomial distribution used to sample the transition probability matrix), see details below. In addition, only one variance value can be specified for the complete transition probability matrix, hence the variance is assumed fixed across all components. The default equals 1, which corresponds to quite some variation between subjects. A less extreme value would be 0.5. If one wants to simulate data from exactly the same HMM for all subjects, var_gamma should be set to 0. `var_emiss` An integer denoting the variance between subjects in the emission distribution. Note that this value corresponds to the variance of the parameters of the multinomial distribution (i.e., the intercepts of the regression equation of the multinomial distribution used to sample the components of the emission distribution), see details below. In addition, only one variance value can be specified for the complete emission distribution, hence the variance is assumed fixed across all components. The default equals 1, which corresponds to quite some variation between subjects. A less extreme value would be 0.5. If one wants to simulate data from exactly the same HMM for all subjects, var_emiss should be set to 0. `return_ind_par` A logical scalar. Should the subject specific transition probability matrix `gamma` and emission probability matrix `emiss_distr` be returned by the function (```return_ind_par = TRUE```) or not (`return_ind_par = FALSE`). The default equals `return_ind_par = FALSE`.

## Details

In simulating the data, having a multilevel structure means that the parameters for each subject are sampled from the population level distribution of the corresponding parameter. The user specifies the population distribution for each parameter: the average population transition probability matrix and its variance, and the average population emission distribution and its variance. For now, the variance is assumed fixed for all components of the transition probability matrix and for all components of the emission distribution, and the simulated data can only consist of one dependent variable. In addition, at this point only one dependent variable can be simulated. That is, the hidden Markov model is a univariate hidden Markov model.

Note: the subject specific) initial state distributions (i.e., the probability of each of the states at the first time point) needed to simulate the data are obtained from the stationary distributions of the subject specific transition probability matrices gamma.

`beta`: As the first element in each row of `gamma` is used as reference category in the multinomial logistic regression, the first matrix in the list `beta` used to predict transition probability matrix `gamma` has a number of rows equal to `m` and the number of columns equal to `m` - 1. The first element in the first row corresponds to the probability of switching from state one to state two. The second element in the first row corresponds to the probability of switching from state one to state three, and so on. The last element in the first row corresponds to the probability of switching from state one to the last state. The same principle holds for the second matrix in the list `beta` used to predict the emission distribution `emiss_distr`: the first element in the first row corresponds to the probability of observing category two in state one. The second element in the first row corresponds to the probability of observing category three is state one, and so on. The last element in the first row corresponds to the probability of observing the last category in state one.

## Value

The following components are returned by the function `sim_mHMM`:

`states`

A matrix containing the simulated hidden state sequences, with one row per hidden state per subject. The first column indicates subject id number. The second column contains the simulated hidden state sequence, consecutively for all subjects. Hence, the id number is repeated over the rows (with the number of repeats equal to the length of the simulated hidden state sequence `T` for each subject).

`obs`

A matrix containing the simulated observed outputs, with one row per simulated observation per subject. The first column indicates subject id number. The second column contains the simulated observation sequence, consecutively for all subjects. Hence, the id number is repeated over rows (with the number of repeats equal to the length of the simulated observation sequence `T` for each subject).

`gamma`

A list containing `n` elements with the simulated subject specific transition probability matrices `gamma`. Only returned if `return_ind_par` is set to `TRUE`.

`emiss_distr`

A list containing `n` elements with the simulated subject specific emission probability matrices `emiss_distr`. Only returned if `return_ind_par` is set to `TRUE`.

`mHMM` for analyzing multilevel hidden Markov data.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46``` ```# simulating data for 10 subjects with each 100 observations n_t <- 100 n <- 10 m <- 3 q_emiss <- 4 gamma <- matrix(c(0.8, 0.1, 0.1, 0.2, 0.7, 0.1, 0.2, 0.2, 0.6), ncol = m, byrow = TRUE) emiss_distr <- matrix(c(0.5, 0.5, 0.0, 0.0, 0.1, 0.1, 0.8, 0.0, 0.0, 0.0, 0.1, 0.9), nrow = m, ncol = q_emiss, byrow = TRUE) data1 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma, emiss_distr = emiss_distr, var_gamma = 1, var_emiss = 1) head(data1\$obs) head(data1\$states) # including a covariate to predict (only) the transition probability matrix gamma beta <- rep(list(NULL), 2) beta[[1]] <- matrix(c(0.5, 1.0, -0.5, 0.5, 0.0, 1.0), byrow = TRUE, ncol = 2) xx_vec <- rep(list(NULL),2) xx_vec[[1]] <- c(rep(0,5), rep(1,5)) data2 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma, emiss_distr = emiss_distr, beta = beta, xx_vec = xx_vec, var_gamma = 1, var_emiss = 1) # simulating subject specific transition probability matrices and emission distributions only n_t <- 0 n <- 5 m <- 3 q_emiss <- 4 gamma <- matrix(c(0.8, 0.1, 0.1, 0.2, 0.7, 0.1, 0.2, 0.2, 0.6), ncol = m, byrow = TRUE) emiss_distr <- matrix(c(0.5, 0.5, 0.0, 0.0, 0.1, 0.1, 0.8, 0.0, 0.0, 0.0, 0.1, 0.9), nrow = m, ncol = q_emiss, byrow = TRUE) data3 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma, emiss_distr = emiss_distr, var_gamma = 1, var_emiss = 1) data3 data4 <- sim_mHMM(n_t = n_t, n = n, m = m, q_emiss = q_emiss, gamma = gamma, emiss_distr = emiss_distr, var_gamma = .5, var_emiss = .5) data4 ```

### Example output

```     subj observation
[1,]    1           2
[2,]    1           1
[3,]    1           2
[4,]    1           1
[5,]    1           3
[6,]    1           2
subj state
[1,]    1     2
[2,]    1     1
[3,]    1     1
[4,]    1     2
[5,]    1     2
[6,]    1     1
\$subject_gamma
\$subject_gamma[[1]]
[,1]   [,2]   [,3]
[1,] 0.8695 0.0844 0.0461
[2,] 0.2761 0.4900 0.2340
[3,] 0.1636 0.1795 0.6568

\$subject_gamma[[2]]
[,1]   [,2]   [,3]
[1,] 0.9227 0.0087 0.0686
[2,] 0.2511 0.6077 0.1412
[3,] 0.0856 0.1363 0.7781

\$subject_gamma[[3]]
[,1]   [,2]   [,3]
[1,] 0.9449 0.0394 0.0157
[2,] 0.1670 0.7535 0.0795
[3,] 0.2375 0.0116 0.7509

\$subject_gamma[[4]]
[,1]   [,2]   [,3]
[1,] 0.8291 0.0892 0.0817
[2,] 0.0393 0.9285 0.0322
[3,] 0.2147 0.0717 0.7136

\$subject_gamma[[5]]
[,1]   [,2]   [,3]
[1,] 0.4222 0.0223 0.5556
[2,] 0.2490 0.5273 0.2236
[3,] 0.1191 0.0644 0.8164

\$subject_emiss
\$subject_emiss[[1]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.6388 0.3612 0.0000 0.0000
[2,] 0.2069 0.1695 0.6235 0.0000
[3,] 0.0000 0.0000 0.1156 0.8844

\$subject_emiss[[2]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.2163 0.7836 0.0000 0.0000
[2,] 0.0500 0.1303 0.8197 0.0000
[3,] 0.0000 0.0000 0.0956 0.9044

\$subject_emiss[[3]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.8261 0.1737 0.0002 0.0000
[2,] 0.2334 0.3599 0.4067 0.0000
[3,] 0.0000 0.0000 0.0014 0.9986

\$subject_emiss[[4]]
[,1]   [,2]   [,3]  [,4]
[1,] 0.4036 0.5964 0.0000 0.000
[2,] 0.0760 0.1877 0.7363 0.000
[3,] 0.0000 0.0000 0.1079 0.892

\$subject_emiss[[5]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.2442 0.7557 0.0000 0.0000
[2,] 0.0150 0.0226 0.9623 0.0000
[3,] 0.0000 0.0000 0.1344 0.8655

\$subject_gamma
\$subject_gamma[[1]]
[,1]   [,2]   [,3]
[1,] 0.7075 0.1589 0.1335
[2,] 0.1854 0.7214 0.0932
[3,] 0.3690 0.4242 0.2069

\$subject_gamma[[2]]
[,1]   [,2]   [,3]
[1,] 0.7753 0.0476 0.1771
[2,] 0.2221 0.7368 0.0411
[3,] 0.1831 0.2037 0.6132

\$subject_gamma[[3]]
[,1]   [,2]   [,3]
[1,] 0.6749 0.1851 0.1400
[2,] 0.1996 0.4871 0.3133
[3,] 0.2435 0.1061 0.6504

\$subject_gamma[[4]]
[,1]   [,2]   [,3]
[1,] 0.8774 0.0724 0.0501
[2,] 0.2040 0.7293 0.0667
[3,] 0.3009 0.3039 0.3952

\$subject_gamma[[5]]
[,1]   [,2]   [,3]
[1,] 0.8567 0.0480 0.0953
[2,] 0.1791 0.7044 0.1165
[3,] 0.1648 0.0842 0.7511

\$subject_emiss
\$subject_emiss[[1]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.4852 0.5148 0.0000 0.0000
[2,] 0.1496 0.3659 0.4845 0.0000
[3,] 0.0000 0.0000 0.1062 0.8938

\$subject_emiss[[2]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.5607 0.4393 0.0000 0.0000
[2,] 0.1263 0.1107 0.7630 0.0000
[3,] 0.0000 0.0000 0.0644 0.9356

\$subject_emiss[[3]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.4194 0.5806 0.0000 0.0000
[2,] 0.0463 0.0243 0.9295 0.0000
[3,] 0.0000 0.0000 0.1205 0.8795

\$subject_emiss[[4]]
[,1]   [,2]   [,3]   [,4]
[1,] 0.6446 0.3554 0.0000 0.0000
[2,] 0.1250 0.0553 0.8197 0.0000
[3,] 0.0000 0.0000 0.1257 0.8742

\$subject_emiss[[5]]
[,1]  [,2]   [,3]   [,4]
[1,] 0.4600 0.540 0.0000 0.0000
[2,] 0.1748 0.203 0.6222 0.0000
[3,] 0.0000 0.000 0.0892 0.9107
```

mHMMbayes documentation built on Oct. 30, 2019, 5:05 p.m.