hsmm: Hidden Semi-Markov Models

Description Usage Arguments Details Value List Objects rd.par and od.par References See Also Examples

View source: R/hsmm.r

Description

Fitting a hidden semi-Markov model with conditional distribution od and runlength distribution rd to the observations x.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
hsmm(x, 
     od, 
     od.par,
     rd        = "nonp", 
     rd.par    = list(np = matrix(0.1, nrow = 10, ncol = 2)),
     pi.par    = c(0.5, 0.5),
     tpm.par   = matrix(c(0, 1, 1, 0), 2),
     M         = NA, 
     Q.max     = 500, 
     epsilon   = 1e-08,  
     censoring = 1,
     prt       = TRUE,
     detailed  = FALSE,
     r.lim     = c(0.01, 100), 
     p.log.lim = c(0.001, 0.999),
     nu.lim    = c(0.01, 100))

Arguments

x

The observations as a vector of length T.

od

Character with the name of the conditional distribution of the observations. The following distributions are currently implemented:

"bern" = Bernoulli
"norm" = Normal
"pois" = Poisson
"t" = Student's t
rd

Character with the name of the runlength distribution (or sojourn time, dwell time distribution). The following distributions are currently implemented:

"nonp" = Non-parametric
"geom" = Geometric
"nbinom" = Negative Binomial
"log" = Logarithmic
"pois" = Poisson
pi.par

Vector of length J with the initial values for the initial probabilities of the semi-Markov chain.

tpm.par

Matrix of dimension J x J with the initial values for the transition probability matrix of the embedded Markov chain. The diagonal entries must all be zero; absorbing states are not permitted.

rd.par

List with the initial values for the parameters of the runlength distributions. See further details below (section 'List Objects rd.par and od.par').

od.par

List with the initial values for the parameters of the conditional observation distributions. See further details below (section 'List Objects rd.par and od.par').

M

Positive integer containing the maximum runlength.

Q.max

Positive integer containing the maximum number of iterations.

epsilon

Positive scalar giving the tolerance at which the relative change of log-likelihood is considered close enough to zero to terminate the algorithm.

censoring

Integer. If equal to 1, the last visited state contributes to the likelihood. If equal to 0, the partial likelihood estimator, which ignores the contribution of the last visited state, is used. For details see Guedon (2003).

prt

Logical. If TRUE, the log-likelihood and number of iterations carried out are printed for each iteration.

detailed

Logical. If TRUE, a list of the parameters at every iteration step is written into the ctrl list.

r.lim

Upper and lower bound for the r parameter of the negative binomial distribution in the M-step, bisection is applied to determine this parameter.

p.log.lim

Upper and lower bound for the parameter of the logarithmic distribution in the M-step, bisection is applied to determine this parameter.

nu.lim

Upper and lower bound for the degrees of freedom of parameter of the t distribution in the M-step, bisection is applied to determine this parameter.

Details

The function hsmm fits a hidden semi-Markov model using the EM algorithm for parameter estimation. The estimation algorithms are based on the right-censored approach initially described in Guedon (2003). This model does not assume that the last observation coincides with an exit from the last visited state. The EM algorithm is an iterative procedure and requires initial values. The results may depend on the initial values selected, because convergence to local maxima is a common phenomenon. Details on the algorithm utilized for the package hsmm are also presented by Bulla (2006).

Default model
The default model is a two-state hidden semi-Markov model with a non-parametric runlength distribution. Thus, the transition probability matrix does not require any initial values (for models with J > 2 states, the transition probability matrix may be initialized by the value 1/(J - 1) for the off-diagonal elements). The non-parametric runlength distribution is implemented as default distribution and initialized by a uniform distribution on the first ten runlengths. Similarly, the initial probabilities for pi follow a uniform distribution. There is no default for the conditional distribution of the observations, because it should not be selected without investigating the data. We would like to point out that the non-parametric runlength distributuion often requires a very high number of observations. Sansom and Thomson (2001), e.g., obtained satisfactory results with series of length 20000 and longer.

Value

call

The matched call.

iter

Positive integer containing the number of iterations carried out.

logl

Double containing log-likelihood of the fitted model.

para

List object containing the parameter estimates.

ctrl

List object containing additional control variables. These are solution.reached, error, and details. solution.reached is TRUE, if the stopping criterion is fulfilled. error returns an error code: 0 = no error, 1 = internal probability less or equal to zero, 2 = memory exception, 3 = file error (internal output from C routine, disabled by default). details contains the parameter values of every iteration.

List Objects rd.par and od.par

The list objects rd.par and od.par contain parameter values for the runlength and conditional observation distribution, respectively. For a model with J states, the length of all parameter vectors is equal to J. For non-parametric runlength distribution, the corresponding entry is a matrix of dimension M x J. The names of the list entries have to be as follows.
od.par:

"bern" (Bernoulli): "b"
"norm" (Normal): "mean", "var"
"pois" (Poisson): "lambda"
"t" (Student's t): "mean", "var", "df"

rd.par:

"nonp" (Non-parametric): "np"
"geom" (Geometric): "p"
"nbinom" (Negative Binomial): "r", "pi"
"log" (Logarithmic): "p"
"pois" (Poisson): "lambda"

References

Bulla, J. (2006), Stylized facts of financial time series and hidden semi-Markov models. Ph.D. thesis, Goettingen.
Guedon, Y. (2003), Estimating Hidden Semi-Markov Chains From Discrete Sequences. JCGS, 12 (3), pp 604-639.
Sansom, J. and Thomson, P. (2001), Fitting hidden semi-Markov models to breakpoint rainfall data. J. Appl. Probab., 38A, pp 142-157

See Also

hsmm.smooth, hsmm.viterbi, hsmm.sim

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Simulating observations: 
# (see hsmm.sim for details)
pipar  <- rep(1/3, 3)
tpmpar <- matrix(c(0, 0.5, 0.5,
                   0.7, 0, 0.3,
                   0.8, 0.2, 0), 3, byrow = TRUE)
rdpar  <- list(p = c(0.98, 0.98, 0.99))
odpar  <- list(mean = c(-1.5, 0, 1.5), var = c(0.5, 0.6, 0.8))
sim    <- hsmm.sim(n = 2000, od = "norm", rd = "log", 
                   pi.par = pipar, tpm.par = tpmpar, 
                   rd.par = rdpar, od.par = odpar, seed = 3539)

# Executing the EM algorithm:
fit    <- hsmm(sim$obs, od = "norm", rd = "log", 
               pi.par = pipar, tpm.par = tpmpar, 
               od.par = odpar, rd.par = rdpar)

# The log-likelihood:
fit$logl

# Ehe estimated parameters:
fit$para

# For comparison, the estimated parameters seperately together with the true parameter values
# are given below.
# Transition probability matrix:
tpmpar
fit$para$tpm
# Observation distribution:
odpar
fit$para$od
# Runlength distribution:
rdpar
fit$para$rd

psobczyk/dhmm documentation built on May 24, 2017, 12:19 p.m.