dthmm: Discrete Time HMM Object (DTHMM)

Description Usage Arguments Value Notation List Object pm List Object pn Complete Data Likelihood Stationarity References Examples

Description

Creates a discrete time hidden Markov model object with class "dthmm". The observed process is univariate.

Usage

1
2
dthmm(x, Pi, delta, distn, pm, pn = NULL, discrete = NULL,
      nonstat = TRUE)

Arguments

x

is a vector of length n containing the univariate observed process. Alternatively, x could be specified as NULL, meaning that the data will be added later (e.g. simulated).

Pi

is the m*m transition probability matrix of the homogeneous hidden Markov chain.

delta

is the marginal probability distribution of the m hidden states at the first time point.

distn

is a character string with the abbreviated distribution name. Distributions provided by the package are Beta ("beta"), Binomial ("binom"), Exponential ("exp"), GammaDist ("gamma"), Lognormal ("lnorm"), Logistic ("logis"), Normal ("norm"), and Poisson ("pois"). See topic Mstep, Section “Modifications and Extensions”, to extend to other distributions.

pm

is a list object containing the (Markov dependent) parameter values associated with the distribution of the observed process (see below).

pn

is a list object containing the observation dependent parameter values associated with the distribution of the observed process (see below).

discrete

is logical, and is TRUE if distn is a discrete distribution. Set automatically for distributions already contained in the package.

nonstat

is logical, TRUE if the homogeneous Markov chain is assumed to be non-stationary, default. See section “Stationarity” below.

Value

A list object with class "dthmm", containing the above arguments as named components.

Notation

  1. MacDonald & Zucchini (1997) use t to denote the time, where t = 1, ..., T. To avoid confusion with other uses of t and T in R, we use i = 1, ..., n.

  2. We denote the observed sequence as {X_i}, i = 1, ..., n; and the hidden Markov chain as {C_i}, i = 1, ..., n.

  3. The history of the observed process up to time i is denoted by X^{(i)}, i.e.

    X^{(i)} = (X_1, ..., X_i)

    where i = 1, ..., n. Similarly for C^{(i)}.

  4. The hidden Markov chain has m states denoted by 1, ..., m.

  5. The Markov chain transition probability matrix is denoted by Pi, where the (j, k)th element is

    pi_{jk} = Pr{ C_{i+1} = k | C_i = j }

    for all i (i.e. all time points), and j,k = 1, ..., m.

  6. The Markov chain is assumed to be homogeneous, i.e. for each j and k, pi_{jk} is constant over time.

  7. The Markov chain is said to be stationary if the marginal distribution is the same over time, i.e. for each j, delta_j = Pr{ C_i = j } is constant for all i. The marginal distribution is denoted by delta = (delta_1, ..., delta_m).

List Object pm

The list object pm contains parameter values for the probability distribution of the observed process that are dependent on the hidden Markov state. These parameters are generally required to be estimated. See “Modifications” in topic Mstep when some do not require estimation.

Assume that the hidden Markov chain has m states, and that there are l parameters that are dependent on the hidden Markov state. Then the list object pm should contain l named vector components each of length m. The names are determined by the required probability distribution.

For example, if distn == "norm", the arguments names must coincide with those used by the functions dnorm or rnorm, which are mean and sd. Each must be specified in either pm or pn. If they both vary according to the hidden Markov state then pm should have the named components mean and sd. These are both vectors of length m containing the means and standard deviations of the observed process when the hidden Markov chain is in each of the m states. If, for example, sd was “time” dependent, then sd would be contained in pn (see below).

If distn == "pois", then pm should have one component named lambda, being the parameter name in the function dpois. Even if there is only one parameter, the vector component should still be within a list and named.

List Object pn

The list object pn contains parameter values of the probability distribution for the observed process that are dependent on the observation number or “time”. These parameters are assumed to be known.

Assume that the observed process is of length n, and that there are l parameters that are dependent on the observation number or time. Then the list object pn should contain l named vector components each of length n. The names, as in pm, are determined by the required probability distribution.

For example, in the observed process we may count the number of successes in a known number of Bernoulli trials, i.e. the number of Bernoulli trials is known at each time point, but the probability of success varies according to a hidden Markov state. The prob parameter of rbinom (or dbinom) would be specified in pm and the size parameter would specified in pn.

One could also have a situation where the observed process was Gaussian, with the means varying according to the hidden Markov state, but the variances varying non-randomly according to the observation number (or vice versa). Here mean would be specified within pm and sd within pn. Note that a given parameter can only occur within one of pm or pn.

Complete Data Likelihood

The “complete data likelihood”, L_c, is

L_c = Pr{ X_1=x_1, ..., X_n=x_n, C_1=c_1, ..., C_n=c_n } .

This can be shown to be

Pr{ X_1=x_1 | C_1=c_1 } Pr{ C_1=c_1 } prod_{i=2}^n Pr{ X_i=x_i | C_i=c_i } Pr{ C_i=c_i | C_{i-1}=c_{i-1} } ,

and hence, substituting model parameters, we get

L_c = delta_{c_1} pi_{c_1c_2} pi_{c_2c_3} ... pi_{c_{n-1}c_n} prod_{i=1}^n Pr{ X_i=x_i | C_i=c_i } ,

and so

log L_c = log delta_{c_1} + sum_{i=2}^n log pi_{c_{i-1}c_i} + sum_{i=1}^n log Pr{ X_i=x_i | C_i=c_i } .

Hence the “complete data likelihood” is split into three terms: the first relates to parameters of the marginal distribution (Markov chain), the second to the transition probabilities, and the third to the distribution parameters of the observed random variable. When the Markov chain is non-stationary, each term can be maximised separately.

Stationarity

When the hidden Markov chain is assumed to be non-stationary, the complete data likelihood has a neat structure, in that delta only occurs in the first term, Pi only occurs in the second term, and the parameters associated with the observed probabilities only occur in the third term. Hence, the likelihood can easily be maximised by maximising each term individually. In this situation, the estimated parameters using BaumWelch will be the “exact” maximum likelihood estimates.

When the hidden Markov chain is assumed to be stationary, delta = Pi' delta (see topic compdelta), and then the first two terms of the complete data likelihood determine the transition probabilities Pi. This raises more complicated numerical problems, as the first term is effectively a constraint. In our implementation of the EM algorithm, we deal with this in a slightly ad-hoc manner by effectively disregarding the first term, which is assumed to be relatively small. In the M-step, the transition matrix is determined by the second term, then delta is estimated using the relation delta = delta Pi. Hence, using the BaumWelch function will only provide approximate maximum likelihood estimates. Exact solutions can be calculated by directly maximising the likelihood function, see first example in neglogLik.

References

Cited references are listed on the HiddenMarkov manual page.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
#-----  Test Gaussian Distribution -----

Pi <- matrix(c(1/2, 1/2,   0,
               1/3, 1/3, 1/3,
                 0, 1/2, 1/2),
             byrow=TRUE, nrow=3)

delta <- c(0, 1, 0)

x <- dthmm(NULL, Pi, delta, "norm",
           list(mean=c(1, 6, 3), sd=c(0.5, 1, 0.5)))

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Poisson Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

x <- dthmm(NULL, Pi, delta, "pois", list(lambda=c(4, 0.1)),
           discrete = TRUE)

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Exponential Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

x <- dthmm(NULL, Pi, delta, "exp", list(rate=c(2, 0.1)))

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Beta Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

x <- dthmm(NULL, Pi, delta, "beta", list(shape1=c(2, 6), shape2=c(6, 2)))

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Binomial Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

#   vector of "fixed & known" number of Bernoulli trials
pn <- list(size=rpois(1000, 10)+1)

x <- dthmm(NULL, Pi, delta, "binom", list(prob=c(0.2, 0.8)), pn,
           discrete=TRUE)

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Gamma Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

pm <- list(rate=c(4, 0.5), shape=c(3, 3))

x <- seq(0.01, 10, 0.01)
plot(x, dgamma(x, rate=pm$rate[1], shape=pm$shape[1]),
     type="l", col="blue", ylab="Density")
points(x, dgamma(x, rate=pm$rate[2], shape=pm$shape[2]),
       type="l", col="red")

x <- dthmm(NULL, Pi, delta, "gamma", pm)

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))

Example output

iter = 1 
LL = -1821.6111177 
diff = Inf 

iter = 2 
LL = -1815.6087525 
diff = 6.002365 

iter = 3 
LL = -1815.5579652 
diff = 0.05078724 

iter = 4 
LL = -1815.5486326 
diff = 0.009332656 

iter = 5 
LL = -1815.5460808 
diff = 0.002551714 

iter = 6 
LL = -1815.5453457 
diff = 0.0007351141 

iter = 7 
LL = -1815.5451317 
diff = 0.0002140232 

iter = 8 
LL = -1815.5450692 
diff = 6.248901e-05 

iter = 9 
LL = -1815.5450510 
diff = 1.82637e-05 

iter = 10 
LL = -1815.5450456 
diff = 5.340349e-06 

$delta
[1] 0 1 0

$Pi
          [,1]      [,2]      [,3]
[1,] 0.4452199 0.5547801 0.0000000
[2,] 0.3478015 0.3238736 0.3283249
[3,] 0.0000000 0.4942723 0.5057277

$nonstat
[1] TRUE

$distn
[1] "norm"

$pm
$pm$mean
[1] 1.04784 5.99757 2.99209

$pm$sd
[1] 0.4543011 0.9872004 0.4738881


$discrete
[1] FALSE

$n
[1] 1000

[1] -1815.545
[1] 1
     [,1]
[1,]    1
[2,]    1
[3,]    1
iter = 1 
LL = -1799.5408113 
diff = Inf 

iter = 2 
LL = -1796.9528049 
diff = 2.588006 

iter = 3 
LL = -1796.7322099 
diff = 0.220595 

iter = 4 
LL = -1796.6627408 
diff = 0.06946912 

iter = 5 
LL = -1796.6366671 
diff = 0.02607367 

iter = 6 
LL = -1796.6263872 
diff = 0.01027986 

iter = 7 
LL = -1796.6222789 
diff = 0.004108301 

iter = 8 
LL = -1796.6206296 
diff = 0.001649302 

iter = 9 
LL = -1796.6199662 
diff = 0.0006634649 

iter = 10 
LL = -1796.6196990 
diff = 0.0002671934 

iter = 11 
LL = -1796.6195913 
diff = 0.0001076796 

iter = 12 
LL = -1796.6195479 
diff = 4.341405e-05 

iter = 13 
LL = -1796.6195304 
diff = 1.750845e-05 

iter = 14 
LL = -1796.6195233 
diff = 7.062227e-06 

$delta
[1] 0 1

$Pi
          [,1]      [,2]
[1,] 0.7708650 0.2291350
[2,] 0.3179975 0.6820025

$nonstat
[1] TRUE

$distn
[1] "pois"

$pm
$pm$lambda
[1] 3.86343067 0.07722189


$discrete
[1] TRUE

$n
[1] 1000

[1] -1796.62
[1] 1
     [,1]
[1,]    1
[2,]    1
iter = 1 
LL = -1884.2825971 
diff = Inf 

iter = 2 
LL = -1883.8707360 
diff = 0.4118611 

iter = 3 
LL = -1883.8427061 
diff = 0.0280299 

iter = 4 
LL = -1883.8374551 
diff = 0.005250979 

iter = 5 
LL = -1883.8363936 
diff = 0.001061511 

iter = 6 
LL = -1883.8361777 
diff = 0.0002158541 

iter = 7 
LL = -1883.8361337 
diff = 4.401801e-05 

iter = 8 
LL = -1883.8361247 
diff = 9.00003e-06 

$delta
[1] 0 1

$Pi
          [,1]      [,2]
[1,] 0.8108036 0.1891964
[2,] 0.2902134 0.7097866

$nonstat
[1] TRUE

$distn
[1] "exp"

$pm
$pm$rate
[1] 1.99770569 0.09631485


$discrete
[1] FALSE

$n
[1] 1000

[1] -1883.836
[1] 1
     [,1]
[1,]    1
[2,]    1
iter = 1 
LL = 124.0666033 
diff = Inf 

iter = 2 
LL = 127.7385009 
diff = 3.671898 

iter = 3 
LL = 127.9669001 
diff = 0.2283992 

iter = 4 
LL = 128.0058536 
diff = 0.0389535 

iter = 5 
LL = 128.0224494 
diff = 0.01659578 

iter = 6 
LL = 128.0328172 
diff = 0.01036785 

iter = 7 
LL = 128.0398857 
diff = 0.007068441 

iter = 8 
LL = 128.0448026 
diff = 0.00491689 

iter = 9 
LL = 128.0482405 
diff = 0.003437987 

iter = 10 
LL = 128.0506472 
diff = 0.002406658 

iter = 11 
LL = 128.0523318 
diff = 0.00168462 

iter = 12 
LL = 128.0535106 
diff = 0.001178761 

iter = 13 
LL = 128.0543350 
diff = 0.0008244503 

iter = 14 
LL = 128.0549114 
diff = 0.0005764132 

iter = 15 
LL = 128.0553143 
diff = 0.0004028632 

iter = 16 
LL = 128.0555958 
diff = 0.0002814873 

iter = 17 
LL = 128.0557924 
diff = 0.0001966335 

iter = 18 
LL = 128.0559298 
diff = 0.0001373318 

iter = 19 
LL = 128.0560257 
diff = 9.589886e-05 

iter = 20 
LL = 128.0560926 
diff = 6.695711e-05 

iter = 21 
LL = 128.0561394 
diff = 4.674451e-05 

iter = 22 
LL = 128.0561720 
diff = 3.263047e-05 

iter = 23 
LL = 128.0561948 
diff = 2.277623e-05 

iter = 24 
LL = 128.0562107 
diff = 1.589687e-05 

iter = 25 
LL = 128.0562218 
diff = 1.109475e-05 

iter = 26 
LL = 128.0562295 
diff = 7.742895e-06 

$delta
[1] 0 1

$Pi
          [,1]      [,2]
[1,] 0.7923990 0.2076010
[2,] 0.3538283 0.6461717

$nonstat
[1] TRUE

$distn
[1] "beta"

$pm
$pm$shape1
[1] 1.935835 5.622934

$pm$shape2
[1] 5.431913 1.865539


$discrete
[1] FALSE

$n
[1] 1000

[1] 128.0562
[1] 1
     [,1]
[1,]    1
[2,]    1
iter = 1 
LL = -2183.0152652 
diff = Inf 

iter = 2 
LL = -2182.6373913 
diff = 0.3778739 

iter = 3 
LL = -2182.6204306 
diff = 0.01696068 

iter = 4 
LL = -2182.6192916 
diff = 0.001138971 

iter = 5 
LL = -2182.6192143 
diff = 7.726688e-05 

iter = 6 
LL = -2182.6192091 
diff = 5.257185e-06 

$delta
[1] 0 1

$Pi
          [,1]      [,2]
[1,] 0.8011736 0.1988264
[2,] 0.3082061 0.6917939

$nonstat
[1] TRUE

$distn
[1] "binom"

$pm
$pm$prob
[1] 0.2042297 0.8020582


$discrete
[1] TRUE

$n
[1] 1000

[1] -2182.619
[1] 1
     [,1]
[1,]    1
[2,]    1
iter = 1 
LL = -1748.8951570 
diff = Inf 

iter = 2 
LL = -1746.7517787 
diff = 2.143378 

iter = 3 
LL = -1746.5427895 
diff = 0.2089893 

iter = 4 
LL = -1746.4698313 
diff = 0.07295816 

iter = 5 
LL = -1746.4380198 
diff = 0.03181151 

iter = 6 
LL = -1746.4234629 
diff = 0.01455694 

iter = 7 
LL = -1746.4167045 
diff = 0.006758342 

iter = 8 
LL = -1746.4135458 
diff = 0.003158781 

iter = 9 
LL = -1746.4120634 
diff = 0.001482365 

iter = 10 
LL = -1746.4113659 
diff = 0.0006975215 

iter = 11 
LL = -1746.4110370 
diff = 0.0003288179 

iter = 12 
LL = -1746.4108818 
diff = 0.0001552026 

iter = 13 
LL = -1746.4108085 
diff = 7.331911e-05 

iter = 14 
LL = -1746.4107739 
diff = 3.465717e-05 

iter = 15 
LL = -1746.4107575 
diff = 1.638875e-05 

iter = 16 
LL = -1746.4107497 
diff = 7.752119e-06 

$delta
[1] 0 1

$Pi
          [,1]      [,2]
[1,] 0.8185245 0.1814755
[2,] 0.2629361 0.7370639

$nonstat
[1] TRUE

$distn
[1] "gamma"

$pm
$pm$rate
[1] 4.1556500 0.4454936

$pm$shape
[1] 3.147937 2.656021


$discrete
[1] FALSE

$n
[1] 1000

[1] -1746.411
[1] 1
     [,1]
[1,]    1
[2,]    1

HiddenMarkov documentation built on April 27, 2021, 5:06 p.m.