In TheoMichelot/hmmTMB: Fit Hidden Markov Models using Template Model Builder

This vignette describes functionalities of the package hmmTMB for Bayesian inference, which are based on Stan (@stan2019; @rstan2022). The package tmbstan conveniently integrates TMB with Stan, such that a TMB model object (such as the one created inside the HMM class in hmmTMB) can directly be used to run MCMC in Stan (@monnahan2018). For a more general introduction to hmmTMB, consult the vignette "Analysing time series data with hidden Markov models in hmmTMB".

knitr::opts_chunk$set(
  message = FALSE, error = FALSE, warning = FALSE,
  comment = NA
)

# Load packages and color palette
library(ggplot2)
theme_set(theme_bw())
library(hmmTMB)
pal <- hmmTMB:::hmmTMB_cols
set.seed(34564529)

Data

As an example, we will use a set of elephant seal movement tracks included in the R package aniMotum (@jonsen2020), for which installation instructions can be found at \url{https://ianjonsen.github.io/aniMotum/index.html}. Our aim is not to provide a thorough ecological analysis, but only to illustrate features of the hmmTMB package.

The first chunk of code below predicts elephant seal positions at regular intervals using aniMotum, which is required for the hidden Markov model (HMM) analysis, and computes relevant movement variables with moveHMM.

library(aniMotum)
library(lubridate)
library(moveHMM)

# Predict positions at regular time intervals
fit <- fit_ssm(sese, model = "crw", time.step = 8)
track_pred <- grab(fit, what = "predicted")

# Format for HMM analysis and get relevant variables
data <- data.frame(ID = track_pred$id,
                   x = track_pred$x,
                   y = track_pred$y,
                   time = track_pred$date,
                   day = yday(track_pred$date)/365)
data <- prepData(data, type = "UTM")
head(data)

The three important columns in this data frame are the time series identifier ID (one value for each movement track), the step length step (which we will use as response variable), and the ordinal day day (which we will use as a covariate).

Model specification

The step length is the distance travelled by the animal between two observed locations and, as a positive variable, it can for example be modelled with a gamma distribution. To allow for time-varying dynamics, we will consider a 2-state HMM for the step length $L_t$ such that $$ L_t \mid { S_t = j } \sim \text{gamma}(\mu_j, \sigma_j) $$ where $\mu_j > 0$ and $\sigma_j > 0$ are state-dependent mean and standard deviation parameters, respectively. That is, the step length is assumed to follow a different gamma distribution in each state. This is a common model in movement ecology to capture different behavioural phases based on an animal's speed (as measured by the step length). We can expect that one state will capture short steps (slow movement), and one state will capture long steps (fast movement).

We also want to understand how the elephant seals' behaviour changes through the year. We will include the effect of the ordinal day on the transition probabilities. In this simple 2-state model, there are two of them: $\gamma_{12} = \Pr(S_{t+1} = 2 \mid S_t = 1)$ and $\gamma_{21} = \Pr(S_{t+1} = 1 \mid S_t = 2)$. We model them as time-varying, with a quadratic effect of ordinal day $d_t$, i.e., $$ \text{logit}(\gamma_{ij}^{(t)}) = \beta_{ij0} + \beta_{ij1} d_t + \beta_{ij2} d_t^2 $$ where the logit function ensures that the probability is between 0 and 1, and the parameters $\beta_{ij0}, \beta_{ij1}, \beta_{ij2}$ are to be estimated.

Model structure

Model specification is identical whether maximum likelihood estimation or MCMC sampling is used for model fitting. As such, both methods can easily both be applied and contrasted for a given model object.

We create a MarkovChain object for the hidden state model, specifying the number of states and the formula for the transition probabilities (quadratic effect of ordinal day). We use the option initial_state = 2 to fix the state process to 2 for all individuals. (We will later interpret state 2 to be the "fast movement" state.) We do this here because the initial distribution parameters are often not well identified, which can lead to convergence issues in the MCMC sampling. We could specify initial parameter values through the tpm0 argument, which would be used as starting points for the MCMC algorithm, but we use the default values here (a transition probability matrix with 0.9 on the diagonal, and covariate effects equal to 0).

hid <- MarkovChain$new(data = data, 
                       n_states = 2, 
                       formula = ~day + I(day^2), 
                       initial_state = 2)

For the Observation model, we use the gamma2 distribution, which is parameterised in terms of mean and standard deviation rather than shape and rate. We provide initial parameters with the argument par, corresponding to plausible parameter values. We choose a larger mean in state 2, with the expectation that state 2 will capture longer step lengths (i.e., fast movement).

dists <- list(step = "gamma2")
par0 <- list(step = list(mean = c(10, 50), sd = c(10, 50)))
obs <- Observation$new(data = data, 
                       dists = dists, 
                       par = par0)

We finally combine the two model components into an HMM object.

hmm <- HMM$new(obs = obs, hid = hid)

Priors

By default, the priors of an HMM object are set to NA, which correspond to an improper flat prior on all model parameters. The function set_priors can be used to specify priors for the observation parameters and/or the transition probabilities.

In practice, hmmTMB transforms parameters to a "working" scale, i.e., into parameters defined over the whole real line (e.g., a positive parameter is log-transformed into a real working parameter). This is to avoid having to deal with constraints during the model fitting. The priors should be defined for those working parameters, rather than for the "natural" parameters that we are interested in.

We can see a list of the priors, and of the parameters on the working scale, using the functions priors() and coeff_fe(), respectively.

hmm$priors()
hmm$coeff_fe()

The observation model has four working parameters: the log mean and the log standard deviation in each state (both log-transformed because the mean and standard deviation are positive). In hmmTMB, only normal priors can be defined, and they should be specified in a matrix with one row for each working parameter, and two columns for the hyperparameters of the prior distribution (mean and standard deviation of normal prior). In this example, we choose the following priors: \begin{align} \log(\mu_1) & \sim N(\log(10), 5^2)\ \log(\mu_2) & \sim N(\log(30), 5^2)\ \log(\sigma_1) & \sim N(\log(10), 5^2)\ \log(\sigma_2) & \sim N(\log(30), 5^2) \end{align} where $\mu_j$ and $\sigma_j$ are the mean and standard deviation of the step length distribution for state $j \in {1, 2}$.

# Parameter of normal priors for observation parameters
prior_obs <- matrix(c(log(10), 5, 
                      log(30), 5,
                      log(10), 5,
                      log(30), 5), 
                    ncol = 2, byrow = TRUE)

In a 2-state model, the working parameters for the hidden state process are the coefficients $(\beta_{120}, \beta_{121}, \beta_{122}, \beta_{210}, \beta_{211}, \beta_{212})$ defined above. For $i \neq j \in { 1, 2}$, $\beta_{ij0}$ is an intercept parameter, $\beta_{ij1}$ is the linear effect of ordinal day, and $\beta_{ij2}$ is the quadratic effect of ordinal day.

We use the following priors, \begin{align} \beta_{ij0} \sim N(-2, 1) \ \beta_{ij1} \sim N(0, 100) \ \beta_{ij2} \sim N(0, 100) \ \end{align}

The mean for the intercept is chosen as $-2$ because $\text{logit}(0.1) \approx -2$, i.e., the prior suggests that the off-diagonal elements of the transition probability matrix should be small (as is often the case in practice due to autocorrelation in the hidden process). The coefficient for day and day squared have priors centered on zero with large variance; one could use prior predictive checks to ensure that these cover a reasonable range of possible relationships between ordinal day and transition probabilities. Note that the definition of the working parameters is a little more complicated in models with more than 2 states.

# Parameter of normal priors for transition probabilities
prior_hid <- matrix(c(-2, 1,
                      0, 100,
                      0, 100,
                      -2, 1,
                      0, 100,
                      0, 100),
                    ncol = 2, byrow = TRUE)

Finally, we update the priors stored in the model object using set_priors(), and we check that they have been correctly set.

# Update priors
hmm$set_priors(new_priors = list(coeff_fe_obs = prior_obs, 
                                 coeff_fe_hid = prior_hid))

hmm$priors()

Fitting the model

The main function to fit a model using Stan in hmmTMB is HMM$fit_stan(). It takes the same arguments as tmbstan() from the tmbstan package, and documentation for that function should be consulted for more details. Here, we pass two arguments: the number of chains (chains) and the number of MCMC iterations in each chain (iter). In practice, these arguments should be chosen carefully to ensure convergence of the sampler to the stationary distribution (see Stan documentation for more information); the values below were merely chosen for speed. In this example, running the sampler for 2000 iterations takes about one minute on a laptop.

hmm$fit_stan(chains = 1, iter = 2000)

Inspecting the results

Working parameters

After running fit_stan(), the output of Stan is accessible with the out_stan() function. This is an object of class stanfit, and it can directly be used with functions from the rstan package, e.g., to create trace plots or density plots of the posterior samples. Note that these plots show the working parameters.

rstan::traceplot(hmm$out_stan())
rstan::stan_dens(hmm$out_stan())

Natural parameters

To inspect the posterior distributions of the natural parameters, which is often more interesting, we can extract posterior samples using iters(). This returns a matrix with one column for each parameter and one row for each MCMC iteration. It can directly be used to make traceplots, histograms, etc., or to produce credible intervals.

Note that, if a parameter depends on a covariate, the first row of the data set is used to predict the parameter value in iters(), so the output should be interpreted with care. In our analysis, the transition probabilities depend on the ordinal day, and they are computed for March 2nd in the plot below. We visualise the effect of ordinal day on the transition probabilities in a later section.

iters <- hmm$iters()
head(iters)

iters_df <- as.data.frame.table(iters)
ggplot(iters_df, aes(x = Freq)) + 
    geom_histogram(bins = 20, fill = "lightgrey", col = "white") + 
    facet_wrap("Var2", nrow = 2, scales = "free_x") +
    labs(x = "parameter value", y = "posterior density")

For credible intervals, we can use quantiles of the posterior samples. As the transition probabilities depend on a covariate, we only compute credible intervals for the parameters of the step length distributions.

# 95% equal-tail credible intervals for observation parameters
apply(iters[,1:4], 2, quantile, probs = c(0.025, 0.975))

Posterior samples of observation distributions

Based on the posterior samples that we have for all model parameters, we can get visualise the posterior distribution of various model components, such as the observation distributions. That is, we compute the density function of the step length distribution in each state, for each posterior sample, and plot them to visualise the uncertainty in the distributions. For clearer visualisation, we only select 100 random posterior samples here.

# Select 100 random posterior samples from the
# 1000 post-warmup samples
ind_post <- sort(sample(1:1000, size = 100))

# Grid of step lengths
step_grid <- seq(0, max(data$step, na.rm = TRUE), length = 100)
# Loop over 100 random posterior samples
obsdist_list <- lapply(ind_post, function(i_post) {
    # Get gamma mean and SD
    par_post1 <- matrix(hmm$iters()[i_post,1:4], nrow = 2)
    # Get gamma scale and rate for R dgamma function
    par_post2 <- cbind(par_post1[,1] ^ 2 / par_post1[,2] ^ 2,
                       par_post1[,1] / par_post1[,2] ^ 2)

    # Compute gamma density in each state
    dens_state1 <- dgamma(step_grid,
                          shape = par_post2[1,1], 
                          rate = par_post2[1,2])
    dens_state2 <- dgamma(step_grid, 
                          shape = par_post2[2,1], 
                          rate = par_post2[2,2])

    dens <- data.frame(step = step_grid,
                       value = c(dens_state1, dens_state2),
                       state = rep(paste0("state ", 1:2), each = 100))
    dens$group <- paste0("iter ", i_post, " - ", dens$state)
    return(dens)
})
# Combine data frames for plot
obsdist_df <- do.call(what = rbind, args = obsdist_list)
ggplot(obsdist_df, aes(step, value, group = group, col = state)) +
    geom_line(linewidth = 0.1, alpha = 0.5) +
    scale_color_manual(values = hmmTMB:::hmmTMB_cols, name = NULL) +
    labs(x = "step length", y = "density")

Posterior samples of transition probabilities

Likewise, we can visualise the uncertainty in the relationship between the transition probabilities and the covariate (ordinal day). The code below show how we can use posterior samples of the regression coefficients to get posterior samples of the relationship.

# Grid of ordinal day
newdata <- data.frame(day = seq(min(data$day), max(data$day), length = 100))

# Loop over randomly selected posterior samples
probs_list <- lapply(ind_post, function(i_post) {
    # Set parameters to a given posterior sample
    hmm$update_par(iter = i_post)

    # Relationship between transition probabilities and 
    # covariate for these parameters
    tpm_post <- hmm$predict(what = "tpm", newdata = newdata)
    probs <- data.frame(prob = c(tpm_post[1,2,], tpm_post[2,1,]))

    probs$transition <- rep(paste0("Pr(", 1:2, " -> ", 2:1, ")"), each = 100)
    probs$group <- paste0("iter ", i_post, " - ", probs$transition)
    return(probs)
})

# Combine data frames for plot
probs_df <- do.call(what = rbind, args = probs_list)
probs_df$day <- newdata$day
ggplot(probs_df, aes(day, prob, group = group)) +
    geom_line(linewidth = 0.1, alpha = 0.5) +
    labs(x = "ordinal day", y = "transition probabilities") +
    facet_wrap("transition")