# simulSM: Simulation of a semi-Markov chain In SMM: Simulation and Estimation of Multi-State Discrete-Time Semi-Markov and Markov Models

## Description

Simulation of a semi-Markov chain starting from chosen parameters. This simulation can be parametric or non-parametric, non-censored, censored at the beginning and/or at the end of the sequence, with one or several trajectories. Several parametric distributions are considered (Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial).

## Usage

 1 2 3 4 5 6 7 # parametric case simulSM(E, NbSeq, lengthSeq, TypeSojournTime = "fij", init, Ptrans, distr, param, laws = NULL, cens.beg = 0, cens.end = 0, File.out = NULL) # non-parametric case simulSM(E, NbSeq, lengthSeq, TypeSojournTime = "fij", init, Ptrans, distr = "NP", param = NULL, laws, cens.beg = 0, cens.end = 0, File.out = NULL) 

## Arguments

 E Vector of state space NbSeq Number of simulated sequences lengthSeq Vector containing the lengths of each simulated sequence (the length of sequences can be different, see details) TypeSojournTime Character : must be one of "fij", "fi", "fj", "f" (for more explanations, see Details) init Vector of initial distribution of length S, with S = length(E) Ptrans Matrix of transition probabilities of the embedded Markov chain J=(J_m)_{m} (The sum of the probabilities on the same row must be equal to 1. According to semi-Markov model, the probability to stay in the same state is assumed to be equal to 0.) distr - "NP" for nonparametric case, laws have to be used, param is useless - Matrix of distributions of size SxS if TypeSojournTime is equal to "fij"; - Vector of distributions of size S if TypeSojournTime is equal to "fi" or "fj"; - A distribution if TypeSojournTime is equal to "f". The distributions to be used in distr must be one of "uniform", "geom", "pois", "dweibull", "nbinom". param - Useless if distr = "NP" - Array of distribution parameters of size SxSx2 (2 corresponds to the maximal number of distribution parameters) if TypeSojournTime is equal to "fij"; - Matrix of distribution parameters of size Sx2 if TypeSojournTime is equal to "fi" or "fj"; - Vector of distribution parameters of length 2 if TypeSojournTime is equal to "f". laws - Useless if distr \neq "NP" - Array of size SxSxKmax if TypeSojournTime is equal to "fij"; - Matrix of size SxKmax if TypeSojournTime is equal to "fi" or "fj"; - Vector of length Kmax if the TypeSojournTime is equal to "f". Kmax is the maximum length of the sojourn times. cens.beg Type of censoring at the beginning of sample paths; 1 (if the first sojourn time is censored) or 0 (if not). All the sequences must be censored in the same way. cens.end Type of censoring at the end of sample paths; 1 (if the last sojourn time is censored) or 0 (if not). All the sequences must be censored in the same way. File.out Name of fasta file to save the sequences; if File.out = NULL, no file is created.

## Details

This function simulates a semi-Markov model in the parametric and non-parametric case, taking into account the type of sojourn time and the censoring described in references.

The non-parametric simulation concerns sojourn time distributions defined by the user. For the parametric simulation, several discrete distributions are considered (see below).

The difference between the Markov model and the semi-Markov model concerns the modelling of the sojourn time. With a Markov chain, the sojourn time distribution is modeled by a Geometric distribution. With a semi-Markov chain, the sojourn time can be arbitrarily distributed. In this package, the available distribution for a semi-Markov model are:

• Uniform: f(x) = 1/n for a ≤ x ≤ b, with n = b-a+1

• Geometric: f(x) = θ (1-θ)^x for x = 0, 1, 2,…,n, 0 < θ ≤ 1, with n > 0 and θ is the probability of success

• Poisson: f(x) = (λ^x exp(-λ))/x! for x = 0, 1, 2,…,n, with n > 0 and λ > 0

• Discrete Weibull of type 1: f(x)=q^{(x-1)^{β}}-q^{x^{β}}, x=1,2,3,…,n, with n > 0, q is the first parameter and β is the second parameter

• Negative Binomial: f(x)=Γ(x+α)/(Γ(α) x!) (α/(α+μ))^{α} (μ/(α+μ))^x, for x = 0, 1, 2,…,n, n > 0,\ Γ is the gamma function, α is the parameter of overdispersion and μ is the mean

• Non-parametric

We define :

• the semi-Markov kernel q_{ij}(k) = P( J_{m+1} = j, T_{m+1} - T_{m} = k | J_{m} = i );

• the transition matrix (Ptrans(i,j))_{i,j} \in E of the embedded Markov chain J = (J_m)_m, Ptrans(i,j) = P( J_{m+1} = j | J_m = i );

• the initial distribution, init = P(J_1 = i) = P(Y_1 = i);

• the conditional sojourn time distributions (f_{ij}(k))_{i,j} \in E,\ k \in N ,\ f_{ij}(k) = P(T_{m+1} - T_m = k | J_m = i, J_{m+1} = j ), f is specified by the argument "param" in the parametric case and by "laws" in the non-parametric case.

In this package we can choose differents types of sojourn times. Four options are available for the sojourn times

• depending on the present state and on the next state ("fij");

• depending only on the present state ("fi");

• depending only on the next state ("fj");

• depending neither on the current, nor on the next state ("f").

If TypeSojournTime is equal to "fij", distr is a matrix (SxS) (e.g., if the row 1 of the 2nd column is "pois", that is to say we go from the first state to the second state following a Poisson distribution). If TypeSojournTime is equal to "fi" or "fj", distr must be a vector (e.g., if the first element of vector is "geom", that is to say we go from the first state to any state according to a Geometric distribution). If TypeSojournTime is equal to "f", distr must be one of "uniform", "geom", "pois", "dweibull", "nbinom" (e.g., if distr is equal to "nbinom", that is to say that the sojourn times when going from any state to any state follows a Negative Binomial distribution). For the non-parametric case, distr is equal to "NP" whatever type of sojourn time given.

If the sequence is censored at the beginning and at the end, cens.beg must be equal to 1 and cens.end must be equal to 1 too. All the sequences must be censored in the same way.

Moreover, in the non-parametric case TypeSojournTime is equal to "fij" then laws must be an array of size SxSxKmax. If distr is equal to "NP" and TypeSojournTime is equal to "fi" or "fj" then laws must be a matrix of size SxKmax. If the distr is equal to "NP" and TypeSojournTime is equal to "f" then laws is a vector of length Kmax.

For the simulation of a non-censored sequence, the length of the sequence can be greater than lengthSeq.

## Value

simulSM returns sequences of size lengthSeq or greater

If File.out is not NULL, a fasta file will be created containing the sequences.

## Author(s)

Caroline Berard, caroline.berard@univ-rouen.fr
Dominique Cellier, dominique.cellier@laposte.net
Mathilde Sautreuil, mathilde.sautreuil@etu.univ-rouen.fr
Nicolas Vergne, nicolas.vergne@univ-rouen.fr

## References

VS Barbu, C Berard, D Cellier, M Sautreuil and N Vergne (2017), Parametric estimation of semi-Markov chains, submitted

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 alphabet = c("a","c","g","t") S = length(alphabet) ## creation of the initial distribution vect.init = c(1/4,1/4,1/4,1/4) # creation of the transition matrix Pij = matrix(c(0,0.2,0.3,0.5,0.4,0,0.2,0.4,0.1,0.2,0,0.7,0.8,0.1,0.1,0), nrow = S, ncol = S, byrow = TRUE) ################################ ## Parametric simulation of a trajectory (of length equal to 50), ## where the sojourn times depend neither on the present state nor on the next state ("f"). ################################ ## Simulation of a sequence of length 50 seq50 = simulSM(E = alphabet, NbSeq = 1, lengthSeq = 50, TypeSojournTime = "f", init = vect.init, Ptrans = Pij, distr = "pois", param = 2) ################################ ## Parametric simulation of several trajectories (3 trajectories of length 1000, ## 10 000 and 2000 respectively), ## where the sojourn times depend on the present state and on the next state ("fij") ## the sojourn time is modelled by different distributions ################################ lengthSeq3 = c(1000, 10000, 2000) ## creation of the distribution matrix distr.matrix = matrix(c("dweibull", "pois", "geom", "nbinom", "geom", "nbinom", "pois", "dweibull", "pois", "pois", "dweibull", "geom", "pois","geom", "geom", "nbinom"), nrow = S, ncol = S, byrow = TRUE) ## creation of an array containing the parameters param1.matrix = matrix(c(0.6,2,0.4,4,0.7,2,5,0.6,2,3,0.6,0.6,4,0.3,0.4,4), nrow = S, ncol = S, byrow = TRUE) param2.matrix = matrix(c(0.8,0,0,2,0,5,0,0.8,0,0,0.8,0,4,0,0,4), nrow = S, ncol = S, byrow = TRUE) param.array = array(c(param1.matrix, param2.matrix), c(S,S,2)) ### Simulation of 3 sequences seq3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fij", init = vect.init, Ptrans = Pij, distr = distr.matrix, param = param.array, File.out = NULL) ################################ ## Parametric simulation of several trajectories (3 trajectories of length 1000, ## 10 000 and 2000 respectively), ## where the sojourn times depend only on the present state ("fi.") ## and the sojourn times are modelled by different distributions. ################################ ## creation of the distribution matrix distr.vect = c("dweibull", "pois", "geom", "nbinom") ## creation of an array containing the parameters param.matrix = matrix(c(0.6,0.8,4,0,0.7,0,5,2), nrow = S, ncol = 2, byrow = TRUE) ### Simulation of 3 sequences without censoring #seqFi3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fi", # init = vect.init, Ptrans = Pij, distr = distr.vect, param = param.matrix, # File.out = NULL) ### Simulation of 3 sequences with censoring at the beginning #seqFi3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fi", # init = vect.init, Ptrans = Pij, distr = distr.vect, param = param.matrix, # File.out = NULL, cens.beg = 1, cens.end = 0) ### Simulation of 3 sequences with censoring at the end #seqFi3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fi", # init = vect.init, Ptrans = Pij, distr = distr.vect, param = param.matrix, # File.out = NULL, cens.beg = 0, cens.end = 1) ### Simulation of 3 sequences with censoring at the beginning and at the end #seqFi3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fi", # init = vect.init, Ptrans = Pij, distr = distr.vect, param = param.matrix, # File.out = NULL, cens.beg = 1, cens.end = 1) ################################ ## Non-parametric simulation of several trajectories (3 trajectories of length 1000, ## 10 000 and 2000 respectively), ## where the sojourn times depend on the present state and on the next state ("fij") ################################ ## creation of a matrix corresponding to the conditionnal sojourn time distribution Kmax = 4 mat1 = matrix(c(0,0.5,0.4,0.6,0.3,0,0.5,0.4,0.7,0.2,0,0.3,0.4,0.6,0.2,0), nrow = S, ncol = S, byrow = TRUE) mat2 = matrix(c(0,0.2,0.3,0.1,0.2,0,0.2,0.3,0.1,0.4,0,0.3,0.2,0.1,0.3,0), nrow = S, ncol = S, byrow = TRUE) mat3 = matrix(c(0,0.1,0.3,0.1,0.3,0,0.1,0.2,0.1,0.2,0,0.3,0.3,0.3,0.4,0), nrow = S, ncol = S, byrow = TRUE) mat4 = matrix(c(0,0.2,0,0.2,0.2,0,0.2,0.1,0.1,0.2,0,0.1,0.1,0,0.1,0), nrow = S, ncol = S, byrow = TRUE) f <- array(c(mat1,mat2,mat3,mat4), c(S,S,Kmax)) ### Simulation of 3 sequences seqNP3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fij", init = vect.init, Ptrans = Pij, laws = f, File.out = NULL) ################################ ## Non-parametric simulation of several trajectories (3 trajectories of length 1000, ## 10 000 and 2000 respectively), ## where the sojourn times depend only on he next state ("fj") ################################ ## creation of a matrix corresponding to the conditional sojourn time distributions Kmax = 6 nparam.matrix = matrix(c(0.2,0.1,0.3,0.2,0.2,0,0.4,0.2,0.1,0,0.2,0.1, 0.5,0.3,0.15,0.05,0,0,0.3,0.2,0.1,0.2,0.2,0), nrow = S, ncol = Kmax, byrow = TRUE) ### Simulation of 3 sequences without censoring #seqNP3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fj", # init = vect.init, Ptrans = Pij, laws = nparam.matrix, File.out = NULL) ### Simulation of 3 sequences with censoring at the beginnig #seqNP3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fj", # init = vect.init, Ptrans = Pij, laws = nparam.matrix, File.out = NULL, # cens.beg = 1, cens.end = 0) ### Simulation of 3 sequences with censoring at the end #seqNP3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fj", # init = vect.init, Ptrans = Pij, laws = nparam.matrix, File.out = NULL, # cens.beg = 0, cens.end = 1) ### Simulation of 3 sequences with censoring at the beginning and at the end #seqNP3 = simulSM(E = alphabet, NbSeq = 3, lengthSeq = lengthSeq3, TypeSojournTime = "fj", # init = vect.init, Ptrans = Pij, laws = nparam.matrix, File.out = NULL, # cens.beg = 1, cens.end = 1)