Description Usage Arguments Value Notation List Object pm List Object pn Complete Data Likelihood Stationarity References Examples
Creates a discrete time hidden Markov model object with class "dthmm"
. The observed process is univariate.
1 2 |
x |
is a vector of length n containing the univariate observed process. Alternatively, |
Pi |
is the m*m transition probability matrix of the homogeneous hidden Markov chain. |
delta |
is the marginal probability distribution of the m hidden states at the first time point. |
distn |
is a character string with the abbreviated distribution name. Distributions provided by the package are |
pm |
is a list object containing the (Markov dependent) parameter values associated with the distribution of the observed process (see below). |
pn |
is a list object containing the observation dependent parameter values associated with the distribution of the observed process (see below). |
discrete |
is logical, and is |
nonstat |
is logical, |
A list
object with class "dthmm"
, containing the above arguments as named components.
MacDonald & Zucchini (1997) use t to denote the time, where t = 1, ..., T. To avoid confusion with other uses of t
and T
in R, we use i = 1, ..., n.
We denote the observed sequence as {X_i}, i = 1, ..., n; and the hidden Markov chain as {C_i}, i = 1, ..., n.
The history of the observed process up to time i is denoted by X^{(i)}, i.e.
X^{(i)} = (X_1, ..., X_i)
where i = 1, ..., n. Similarly for C^{(i)}.
The hidden Markov chain has m states denoted by 1, ..., m.
The Markov chain transition probability matrix is denoted by Pi, where the (j, k)th element is
pi_{jk} = Pr{ C_{i+1} = k | C_i = j }
for all i (i.e. all time points), and j,k = 1, ..., m.
The Markov chain is assumed to be homogeneous, i.e. for each j and k, pi_{jk} is constant over time.
The Markov chain is said to be stationary if the marginal distribution is the same over time, i.e. for each j, delta_j = Pr{ C_i = j } is constant for all i. The marginal distribution is denoted by delta = (delta_1, ..., delta_m).
The list object pm
contains parameter values for the probability distribution of the observed process that are dependent on the hidden Markov state. These parameters are generally required to be estimated. See “Modifications” in topic Mstep
when some do not require estimation.
Assume that the hidden Markov chain has m states, and that there are l parameters that are dependent on the hidden Markov state. Then the list object pm
should contain l named vector components each of length m. The names are determined by the required probability distribution.
For example, if distn == "norm"
, the arguments names must coincide with those used by the functions dnorm
or rnorm
, which are mean
and sd
. Each must be specified in either pm
or pn
. If they both vary according to the hidden Markov state then pm
should have the named components mean
and sd
. These are both vectors of length m containing the means and standard deviations of the observed process when the hidden Markov chain is in each of the m states. If, for example, sd
was “time” dependent, then sd
would be contained in pn
(see below).
If distn == "pois"
, then pm
should have one component named lambda
, being the parameter name in the function dpois
. Even if there is only one parameter, the vector component should still be within a list and named.
The list object pn
contains parameter values of the probability distribution for the observed process that are dependent on the observation number or “time”. These parameters are assumed to be known.
Assume that the observed process is of length n, and that there are l parameters that are dependent on the observation number or time. Then the list object pn
should contain l named vector components each of length n. The names, as in pm
, are determined by the required probability distribution.
For example, in the observed process we may count the number of successes in a known number of Bernoulli trials, i.e. the number of Bernoulli trials is known at each time point, but the probability of success varies according to a hidden Markov state. The prob
parameter of rbinom
(or dbinom
) would be specified in pm
and the size
parameter would specified in pn
.
One could also have a situation where the observed process was Gaussian, with the means varying according to the hidden Markov state, but the variances varying non-randomly according to the observation number (or vice versa). Here mean
would be specified within pm
and sd
within pn
. Note that a given parameter can only occur within one of pm
or pn
.
The “complete data likelihood”, L_c, is
L_c = Pr{ X_1=x_1, ..., X_n=x_n, C_1=c_1, ..., C_n=c_n } .
This can be shown to be
Pr{ X_1=x_1 | C_1=c_1 } Pr{ C_1=c_1 } prod_{i=2}^n Pr{ X_i=x_i | C_i=c_i } Pr{ C_i=c_i | C_{i-1}=c_{i-1} } ,
and hence, substituting model parameters, we get
L_c = delta_{c_1} pi_{c_1c_2} pi_{c_2c_3} ... pi_{c_{n-1}c_n} prod_{i=1}^n Pr{ X_i=x_i | C_i=c_i } ,
and so
log L_c = log delta_{c_1} + sum_{i=2}^n log pi_{c_{i-1}c_i} + sum_{i=1}^n log Pr{ X_i=x_i | C_i=c_i } .
Hence the “complete data likelihood” is split into three terms: the first relates to parameters of the marginal distribution (Markov chain), the second to the transition probabilities, and the third to the distribution parameters of the observed random variable. When the Markov chain is non-stationary, each term can be maximised separately.
When the hidden Markov chain is assumed to be non-stationary, the complete data likelihood has a neat structure, in that delta only occurs in the first term, Pi only occurs in the second term, and the parameters associated with the observed probabilities only occur in the third term. Hence, the likelihood can easily be maximised by maximising each term individually. In this situation, the estimated parameters using BaumWelch
will be the “exact” maximum likelihood estimates.
When the hidden Markov chain is assumed to be stationary, delta = Pi' delta (see topic compdelta
), and then the first two terms of the complete data likelihood determine the transition probabilities Pi. This raises more complicated numerical problems, as the first term is effectively a constraint. In our implementation of the EM algorithm, we deal with this in a slightly ad-hoc manner by effectively disregarding the first term, which is assumed to be relatively small. In the M-step, the transition matrix is determined by the second term, then delta is estimated using the relation delta = delta Pi. Hence, using the BaumWelch
function will only provide approximate maximum likelihood estimates. Exact solutions can be calculated by directly maximising the likelihood function, see first example in neglogLik
.
Cited references are listed on the HiddenMarkov manual page.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | #----- Test Gaussian Distribution -----
Pi <- matrix(c(1/2, 1/2, 0,
1/3, 1/3, 1/3,
0, 1/2, 1/2),
byrow=TRUE, nrow=3)
delta <- c(0, 1, 0)
x <- dthmm(NULL, Pi, delta, "norm",
list(mean=c(1, 6, 3), sd=c(0.5, 1, 0.5)))
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Poisson Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
x <- dthmm(NULL, Pi, delta, "pois", list(lambda=c(4, 0.1)),
discrete = TRUE)
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Exponential Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
x <- dthmm(NULL, Pi, delta, "exp", list(rate=c(2, 0.1)))
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Beta Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
x <- dthmm(NULL, Pi, delta, "beta", list(shape1=c(2, 6), shape2=c(6, 2)))
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Binomial Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
# vector of "fixed & known" number of Bernoulli trials
pn <- list(size=rpois(1000, 10)+1)
x <- dthmm(NULL, Pi, delta, "binom", list(prob=c(0.2, 0.8)), pn,
discrete=TRUE)
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Gamma Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
pm <- list(rate=c(4, 0.5), shape=c(3, 3))
x <- seq(0.01, 10, 0.01)
plot(x, dgamma(x, rate=pm$rate[1], shape=pm$shape[1]),
type="l", col="blue", ylab="Density")
points(x, dgamma(x, rate=pm$rate[2], shape=pm$shape[2]),
type="l", col="red")
x <- dthmm(NULL, Pi, delta, "gamma", pm)
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
|
iter = 1
LL = -1821.6111177
diff = Inf
iter = 2
LL = -1815.6087525
diff = 6.002365
iter = 3
LL = -1815.5579652
diff = 0.05078724
iter = 4
LL = -1815.5486326
diff = 0.009332656
iter = 5
LL = -1815.5460808
diff = 0.002551714
iter = 6
LL = -1815.5453457
diff = 0.0007351141
iter = 7
LL = -1815.5451317
diff = 0.0002140232
iter = 8
LL = -1815.5450692
diff = 6.248901e-05
iter = 9
LL = -1815.5450510
diff = 1.82637e-05
iter = 10
LL = -1815.5450456
diff = 5.340349e-06
$delta
[1] 0 1 0
$Pi
[,1] [,2] [,3]
[1,] 0.4452199 0.5547801 0.0000000
[2,] 0.3478015 0.3238736 0.3283249
[3,] 0.0000000 0.4942723 0.5057277
$nonstat
[1] TRUE
$distn
[1] "norm"
$pm
$pm$mean
[1] 1.04784 5.99757 2.99209
$pm$sd
[1] 0.4543011 0.9872004 0.4738881
$discrete
[1] FALSE
$n
[1] 1000
[1] -1815.545
[1] 1
[,1]
[1,] 1
[2,] 1
[3,] 1
iter = 1
LL = -1799.5408113
diff = Inf
iter = 2
LL = -1796.9528049
diff = 2.588006
iter = 3
LL = -1796.7322099
diff = 0.220595
iter = 4
LL = -1796.6627408
diff = 0.06946912
iter = 5
LL = -1796.6366671
diff = 0.02607367
iter = 6
LL = -1796.6263872
diff = 0.01027986
iter = 7
LL = -1796.6222789
diff = 0.004108301
iter = 8
LL = -1796.6206296
diff = 0.001649302
iter = 9
LL = -1796.6199662
diff = 0.0006634649
iter = 10
LL = -1796.6196990
diff = 0.0002671934
iter = 11
LL = -1796.6195913
diff = 0.0001076796
iter = 12
LL = -1796.6195479
diff = 4.341405e-05
iter = 13
LL = -1796.6195304
diff = 1.750845e-05
iter = 14
LL = -1796.6195233
diff = 7.062227e-06
$delta
[1] 0 1
$Pi
[,1] [,2]
[1,] 0.7708650 0.2291350
[2,] 0.3179975 0.6820025
$nonstat
[1] TRUE
$distn
[1] "pois"
$pm
$pm$lambda
[1] 3.86343067 0.07722189
$discrete
[1] TRUE
$n
[1] 1000
[1] -1796.62
[1] 1
[,1]
[1,] 1
[2,] 1
iter = 1
LL = -1884.2825971
diff = Inf
iter = 2
LL = -1883.8707360
diff = 0.4118611
iter = 3
LL = -1883.8427061
diff = 0.0280299
iter = 4
LL = -1883.8374551
diff = 0.005250979
iter = 5
LL = -1883.8363936
diff = 0.001061511
iter = 6
LL = -1883.8361777
diff = 0.0002158541
iter = 7
LL = -1883.8361337
diff = 4.401801e-05
iter = 8
LL = -1883.8361247
diff = 9.00003e-06
$delta
[1] 0 1
$Pi
[,1] [,2]
[1,] 0.8108036 0.1891964
[2,] 0.2902134 0.7097866
$nonstat
[1] TRUE
$distn
[1] "exp"
$pm
$pm$rate
[1] 1.99770569 0.09631485
$discrete
[1] FALSE
$n
[1] 1000
[1] -1883.836
[1] 1
[,1]
[1,] 1
[2,] 1
iter = 1
LL = 124.0666033
diff = Inf
iter = 2
LL = 127.7385009
diff = 3.671898
iter = 3
LL = 127.9669001
diff = 0.2283992
iter = 4
LL = 128.0058536
diff = 0.0389535
iter = 5
LL = 128.0224494
diff = 0.01659578
iter = 6
LL = 128.0328172
diff = 0.01036785
iter = 7
LL = 128.0398857
diff = 0.007068441
iter = 8
LL = 128.0448026
diff = 0.00491689
iter = 9
LL = 128.0482405
diff = 0.003437987
iter = 10
LL = 128.0506472
diff = 0.002406658
iter = 11
LL = 128.0523318
diff = 0.00168462
iter = 12
LL = 128.0535106
diff = 0.001178761
iter = 13
LL = 128.0543350
diff = 0.0008244503
iter = 14
LL = 128.0549114
diff = 0.0005764132
iter = 15
LL = 128.0553143
diff = 0.0004028632
iter = 16
LL = 128.0555958
diff = 0.0002814873
iter = 17
LL = 128.0557924
diff = 0.0001966335
iter = 18
LL = 128.0559298
diff = 0.0001373318
iter = 19
LL = 128.0560257
diff = 9.589886e-05
iter = 20
LL = 128.0560926
diff = 6.695711e-05
iter = 21
LL = 128.0561394
diff = 4.674451e-05
iter = 22
LL = 128.0561720
diff = 3.263047e-05
iter = 23
LL = 128.0561948
diff = 2.277623e-05
iter = 24
LL = 128.0562107
diff = 1.589687e-05
iter = 25
LL = 128.0562218
diff = 1.109475e-05
iter = 26
LL = 128.0562295
diff = 7.742895e-06
$delta
[1] 0 1
$Pi
[,1] [,2]
[1,] 0.7923990 0.2076010
[2,] 0.3538283 0.6461717
$nonstat
[1] TRUE
$distn
[1] "beta"
$pm
$pm$shape1
[1] 1.935835 5.622934
$pm$shape2
[1] 5.431913 1.865539
$discrete
[1] FALSE
$n
[1] 1000
[1] 128.0562
[1] 1
[,1]
[1,] 1
[2,] 1
iter = 1
LL = -2183.0152652
diff = Inf
iter = 2
LL = -2182.6373913
diff = 0.3778739
iter = 3
LL = -2182.6204306
diff = 0.01696068
iter = 4
LL = -2182.6192916
diff = 0.001138971
iter = 5
LL = -2182.6192143
diff = 7.726688e-05
iter = 6
LL = -2182.6192091
diff = 5.257185e-06
$delta
[1] 0 1
$Pi
[,1] [,2]
[1,] 0.8011736 0.1988264
[2,] 0.3082061 0.6917939
$nonstat
[1] TRUE
$distn
[1] "binom"
$pm
$pm$prob
[1] 0.2042297 0.8020582
$discrete
[1] TRUE
$n
[1] 1000
[1] -2182.619
[1] 1
[,1]
[1,] 1
[2,] 1
iter = 1
LL = -1748.8951570
diff = Inf
iter = 2
LL = -1746.7517787
diff = 2.143378
iter = 3
LL = -1746.5427895
diff = 0.2089893
iter = 4
LL = -1746.4698313
diff = 0.07295816
iter = 5
LL = -1746.4380198
diff = 0.03181151
iter = 6
LL = -1746.4234629
diff = 0.01455694
iter = 7
LL = -1746.4167045
diff = 0.006758342
iter = 8
LL = -1746.4135458
diff = 0.003158781
iter = 9
LL = -1746.4120634
diff = 0.001482365
iter = 10
LL = -1746.4113659
diff = 0.0006975215
iter = 11
LL = -1746.4110370
diff = 0.0003288179
iter = 12
LL = -1746.4108818
diff = 0.0001552026
iter = 13
LL = -1746.4108085
diff = 7.331911e-05
iter = 14
LL = -1746.4107739
diff = 3.465717e-05
iter = 15
LL = -1746.4107575
diff = 1.638875e-05
iter = 16
LL = -1746.4107497
diff = 7.752119e-06
$delta
[1] 0 1
$Pi
[,1] [,2]
[1,] 0.8185245 0.1814755
[2,] 0.2629361 0.7370639
$nonstat
[1] TRUE
$distn
[1] "gamma"
$pm
$pm$rate
[1] 4.1556500 0.4454936
$pm$shape
[1] 3.147937 2.656021
$discrete
[1] FALSE
$n
[1] 1000
[1] -1746.411
[1] 1
[,1]
[1,] 1
[2,] 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.