# hsmm: Hidden Semi-Markov Models In hsmm: Hidden Semi Markov Models

## Description

Fitting a hidden semi-Markov model with conditional distribution `od` and runlength distribution `rd` to the observations `x`.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```hsmm(x, od, od.par, rd = "nonp", rd.par = list(np = matrix(0.1, nrow = 10, ncol = 2)), pi.par = c(0.5, 0.5), tpm.par = matrix(c(0, 1, 1, 0), 2), M = NA, Q.max = 500, epsilon = 1e-08, censoring = 1, prt = TRUE, detailed = FALSE, r.lim = c(0.01, 100), p.log.lim = c(0.001, 0.999), nu.lim = c(0.01, 100)) ```

## Arguments

` x`

The observations as a vector of length T.

` od`

Character with the name of the conditional distribution of the observations. The following distributions are currently implemented:

 `"bern"` = Bernoulli `"norm"` = Normal `"pois"` = Poisson `"t"` = Student's t
` rd`

Character with the name of the runlength distribution (or sojourn time, dwell time distribution). The following distributions are currently implemented:

 `"nonp"` = Non-parametric `"geom"` = Geometric `"nbinom"` = Negative Binomial `"log"` = Logarithmic `"pois"` = Poisson
` pi.par`

Vector of length J with the initial values for the initial probabilities of the semi-Markov chain.

` tpm.par`

Matrix of dimension J x J with the initial values for the transition probability matrix of the embedded Markov chain. The diagonal entries must all be zero; absorbing states are not permitted.

` rd.par`

List with the initial values for the parameters of the runlength distributions. See further details below (section 'List Objects rd.par and od.par').

` od.par`

List with the initial values for the parameters of the conditional observation distributions. See further details below (section 'List Objects rd.par and od.par').

` M`

Positive integer containing the maximum runlength.

` Q.max`

Positive integer containing the maximum number of iterations.

` epsilon`

Positive scalar giving the tolerance at which the relative change of log-likelihood is considered close enough to zero to terminate the algorithm.

` censoring`

Integer. If equal to 1, the last visited state contributes to the likelihood. If equal to 0, the partial likelihood estimator, which ignores the contribution of the last visited state, is used. For details see Guedon (2003).

` prt`

Logical. If TRUE, the log-likelihood and number of iterations carried out are printed for each iteration.

` detailed`

Logical. If TRUE, a list of the parameters at every iteration step is written into the `ctrl` list.

` r.lim`

Upper and lower bound for the r parameter of the negative binomial distribution in the M-step, bisection is applied to determine this parameter.

` p.log.lim`

Upper and lower bound for the parameter of the logarithmic distribution in the M-step, bisection is applied to determine this parameter.

` nu.lim`

Upper and lower bound for the degrees of freedom of parameter of the t distribution in the M-step, bisection is applied to determine this parameter.

## Details

The function `hsmm` fits a hidden semi-Markov model using the EM algorithm for parameter estimation. The estimation algorithms are based on the right-censored approach initially described in Guedon (2003). This model does not assume that the last observation coincides with an exit from the last visited state. The EM algorithm is an iterative procedure and requires initial values. The results may depend on the initial values selected, because convergence to local maxima is a common phenomenon. Details on the algorithm utilized for the package `hsmm` are also presented by Bulla (2006).

Default model
The default model is a two-state hidden semi-Markov model with a non-parametric runlength distribution. Thus, the transition probability matrix does not require any initial values (for models with J > 2 states, the transition probability matrix may be initialized by the value 1/(J - 1) for the off-diagonal elements). The non-parametric runlength distribution is implemented as default distribution and initialized by a uniform distribution on the first ten runlengths. Similarly, the initial probabilities for pi follow a uniform distribution. There is no default for the conditional distribution of the observations, because it should not be selected without investigating the data. We would like to point out that the non-parametric runlength distributuion often requires a very high number of observations. Sansom and Thomson (2001), e.g., obtained satisfactory results with series of length 20000 and longer.

## Value

 `call` The matched call. `iter` Positive integer containing the number of iterations carried out. `logl` Double containing log-likelihood of the fitted model. `para` List object containing the parameter estimates. `ctrl` List object containing additional control variables. These are `solution.reached`, `error`, and `details`. `solution.reached` is TRUE, if the stopping criterion is fulfilled. `error` returns an error code: 0 = no error, 1 = internal probability less or equal to zero, 2 = memory exception, 3 = file error (internal output from C routine, disabled by default). `details` contains the parameter values of every iteration.

## List Objects rd.par and od.par

The list objects `rd.par` and `od.par` contain parameter values for the runlength and conditional observation distribution, respectively. For a model with J states, the length of all parameter vectors is equal to J. For non-parametric runlength distribution, the corresponding entry is a matrix of dimension M x J. The names of the list entries have to be as follows.
`od.par`:

 `"bern"` (Bernoulli): `"b"` `"norm"` (Normal): `"mean"`, `"var"` `"pois"` (Poisson): `"lambda"` `"t"` (Student's t): `"mean"`, `"var"`, `"df"`

`rd.par`:

 `"nonp"` (Non-parametric): `"np"` `"geom"` (Geometric): `"p"` `"nbinom"` (Negative Binomial): `"r"`, `"pi"` `"log"` (Logarithmic): `"p"` `"pois"` (Poisson): `"lambda"`

## References

Bulla, J. (2006), Stylized facts of financial time series and hidden semi-Markov models. Ph.D. thesis, Goettingen.
Guedon, Y. (2003), Estimating Hidden Semi-Markov Chains From Discrete Sequences. JCGS, 12 (3), pp 604-639.
Sansom, J. and Thomson, P. (2001), Fitting hidden semi-Markov models to breakpoint rainfall data. J. Appl. Probab., 38A, pp 142-157

`hsmm.smooth`, `hsmm.viterbi`, `hsmm.sim`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ```# Simulating observations: # (see hsmm.sim for details) pipar <- rep(1/3, 3) tpmpar <- matrix(c(0, 0.5, 0.5, 0.7, 0, 0.3, 0.8, 0.2, 0), 3, byrow = TRUE) rdpar <- list(p = c(0.98, 0.98, 0.99)) odpar <- list(mean = c(-1.5, 0, 1.5), var = c(0.5, 0.6, 0.8)) sim <- hsmm.sim(n = 2000, od = "norm", rd = "log", pi.par = pipar, tpm.par = tpmpar, rd.par = rdpar, od.par = odpar, seed = 3539) # Executing the EM algorithm: fit <- hsmm(sim\$obs, od = "norm", rd = "log", pi.par = pipar, tpm.par = tpmpar, od.par = odpar, rd.par = rdpar) # The log-likelihood: fit\$logl # Ehe estimated parameters: fit\$para # For comparison, the estimated parameters seperately together with the true parameter values # are given below. # Transition probability matrix: tpmpar fit\$para\$tpm # Observation distribution: odpar fit\$para\$od # Runlength distribution: rdpar fit\$para\$rd ```