hmm: Fitting Hidden Markov Models

Description Usage Arguments Details Value See Also Examples

View source: R/hmm.R

Description

hmm is used to fit (or simply define) Hidden Markov Models. Most commonly, it will be used to simply fit a vector of observed emissions which come from a (known) common distribution family to a hidden Markov model, where the parameters of the underlying Markov chain as well as the emission distribution family is then fitted using the EM-algorithm.

Alternatively, one can use custom distribution families by providing various functions (see details).

Finally, one can also disable optimization altogether to simply obtain an 'hmm'-object which can then be assessed through various standard methods (see below).

Usage

1
hmm(x, Gamma, delta, dist = "custom", ..., estimate = !is.null(x))

Arguments

x

Numeric vector of observed emissions. Can be NULL if no estimation is desired.

Gamma

Initial value of transition probability matrix for underlying Markov chain.

delta

Numeric vector of probabilities of the initial distribution of the Markov chain.

dist

Distribution family of emissions. Can be one of 'poisson', 'normal', 'binom', 'exponential' or 'custom'. If 'custom', user must provide own functions for densities, MLEs and random generation (see details)

...

Parameters of emission distribution as well as parameters for EM-algorithm. See details.

estimate

Logical variable whether or not parameters of model (Gamma, delta and emission parameters) should be estimated by EM-algorithm. Defaults to TRUE when data is available.

Details

Theory

This function is used to fit or define a hidden Markov model, i.e. a model where the distributions of X_1, ..., X_n (the emissions) depend on a hidden sequence Y_1, ..., Y_n (the hidden states). In particular, in this model, Y_1, ..., Y_n constitutes a Markov chain on the finite state space 1, ..., m, and the distribution of X_i depends only on the value of Y_i, i.e. X_i | Y_i=k ~ X_j | Y_j=k for all i, j.

As such, to estimate parameters in this model, one must estimate not only the parameters of the m emission distributions, but also the parameters of the Markov chain, i.e. the initial distribution delta (that is, Y_1 ~ delta) and the transition matrix Gamma (that is, P(Y_k=j | Y_(k-1)=i)=Gamma_i,j)

Argument details

The dist and ... arguments determine the distribution of the emissions in the different states. If dist is one of 'poisson', 'normal', 'binom' or 'exponential', distribution parameters must be passed to the ... argument. In particular they must have the following format:

'poisson' lambda A vector of Poisson parameters, one per hidden state.
'normal' mean, sd Vectors of means and standard deviations, one per hidden state.
'binom' size, prob size is a single integer, denoting the common size for all hidden states, while prob is a vector of success probabilities, one per hidden state.
'exponential' rate Vector of rates for the exponential distributions, one per hidden state.

If dist is 'custom', user must provide the following:

lls Function or list of functions, where each function is on the form f(x, param), and returns the conditional density of X_i given the parameter param.
params_lls List of parameters, one for each state. Must fit with lls in the sense that lls(x, param[[i]]) (or lls[[i]](x, param[[i]]) if lls is a list) denotes the density in point x of X_j given Y_j=i
lls_mle Function or list of functions, which return the maximum likelihood estimates given the provided data x and a vector (of equal length) of scalars u where each scalar is between 0 and 1. That is, the function(s) must be on the form h(x, u) and must return the value of param maximizing sum(u * log(f(x, param))).
rdist Function or list of functions that generate random emissions from the different hidden states. The function(s) must be on the form r(n, param) and must return a vector of n random realizations given the parameter param.

For each of lls, lls_mle and rdist, if the distribution family itself (and not just the parameters) depends on the hidden state, a list of functions must be provided, one for each state.

Finally, ... also takes arguments passed to the EM-algorithm, namely: epsilon, .

epsilon The value at which estimation halts, if the difference in log-likelihood between iterations falls below
max_iter Maximum number of iterations before halting.

Value

An object of class 'hmm'. The 'hmm'-class is equipped with a variety of default methods, see 'See also' section for details. An object of class 'hmm' is a list containing at least the following components:

m Number of hidden states.
dist Distribution family of emissions.
Gamma The transition probability matrix of the underlying Markov chain.
delta The initial distribution of the underlying Markov chain.
parameters List of parameters for the emission distribution family.
rdists List of functions used for simulating. Only for internal use.

If x is not NULL, it will also include:

x Vector of observations.
n Number of observations.
logLik Log-likelihood of parameters given the observed data.
AIC AIC of the model.
BIC BIC of the model.
viterbi_s States of global decoding of the model.
posterior_s States of local decoding of the model.
viterbi_s Probs. of global decoding of the model.
posterior_s Probs. of local decoding of the model.

Finally, if estimation is performed, it will also include the following:

EMlogLik Vector of log-likelihoods at each iteration in the EM-algorithm.
n_iter Number of iterations performed in the EM-algorithm.

See Also

The hmm object has methods for the following generic functions: AIC, BIC, fitted.values, logLik, plot, print, residuals, simulate and summary. Some (most) of these are only available, when data is provided, i.e. when x is not NULL.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Annual counts of earthquakes magnitude 7 or greater, 1900-2006.
# Source:
# Earthquake Data Base System of the U.S. Geological Survey, National
# Earthquake Information Center, Golden CO

quakes <- read.table("http://www.hmms-for-time-series.de/second/data/earthquakes.txt")$V2
Gamma <- rbind(c(0.9, 0.1), c(0.1, 0.9))
delta <- c(1, 1)/2
lambda <- c(10, 30)
hmm.EQ <- hmm(quakes, Gamma, delta, dist='poisson', lambda=lambda)
hmm.EQ

# If one does not want estimation by EM algorithm (e.g. for comparison of summary statistics), it can be disabled
hmm.EQ_no_opt <- hmm(quakes, Gamma, delta, dist='poisson', lambda=lambda, estimate=FALSE)
hmm.EQ_no_opt

# Creating 'empty' hmm object for sake of simulation (see simulate for further details)
# Here where all emission distributions are normal
Gamma <- rbind(c(0.5, 0.25, 0.25),
               c(0.1, 0.8 , 0.1),
               c(  0, 0.2 , 0.8))
delta <- c(1, 0, 0)
mean <- c(0, 5, 10)
sd <- rep(1, 3)

hmm.normal <- hmm(NULL, Gamma=Gamma, delta=delta, dist='normal', mean=mean, sd=sd)
hmm.normal

# Here, the emission distributions are custom (Uniform[0, theta])
Gamma <- rbind(c(0.5, 0.25, 0.25),
               c(0.1, 0.8 , 0.1),
               c(  0, 0.2 , 0.8))
delta <- c(1, 0, 0)
theta <- list(1, 5, 10)
lls <- function(x, param){dunif(x, 0, param)}
lls_mle <- function(x, u){max(x)}
rdist <- function(n, param){runif(n, 0, param)}
hmm.unif <- hmm(NULL, Gamma=Gamma, delta=delta, lls=lls, param_lls=theta, lls_mle=lls_mle, rdist=rdist)
hmm.unif

# Here, the emission distributions is either normal(0, 1) or exponential(1)
Gamma <- rbind(c(0.2, 0.8),
               c(0.8, 0.2))
delta <- c(1, 1)/2
param <- list(c(0, 1), 1)

lls <- list(function(x, param){dnorm(x, param[1], param[2])},
            function(x, param){dexp(x, param)})

lls_mle <- list(function(x, u){mean_hat <- sum(u*x) / sum(u); c(mean_hat, sqrt(sum(u*(x-mean_hat)^2) / sum(u)))},
                function(x, u){sum(u)/sum(u*x)})

rdist <- list(function(n, param){do.call(rnorm, args=as.list(c(n, param)))},
              function(n, param){rexp(n, param)})

hmm.mixture <- hmm(NULL, Gamma=Gamma, delta=delta, lls=lls, param_lls=param, lls_mle=lls_mle, rdist=rdist)
hmm.mixture

AdvancedR-2021/hmm documentation built on Dec. 17, 2021, 7:41 a.m.