dDHMM: Dynamic Hidden Markov Model distribution for use in 'nimble'...
In nimble-dev/nimbleEcology: Distributions for Ecological Models in 'nimble'

dDHMM

R Documentation

Dynamic Hidden Markov Model distribution for use in `nimble` models

Description

dDHMM and dDHMMo provide Dynamic hidden Markov model distributions for nimble models.

Usage

dDHMM(x, init, probObs, probTrans, len, checkRowSums = 1, log = 0)

dDHMMo(x, init, probObs, probTrans, len, checkRowSums = 1, log = 0)

rDHMM(n, init, probObs, probTrans, len, checkRowSums = 1)

rDHMMo(n, init, probObs, probTrans, len, checkRowSums = 1)

Arguments

`x`	vector of observations, each one a positive integer corresponding to an observation state (one value of which could can correspond to "not observed", and another value of which can correspond to "dead" or "removed from system").
`init`	vector of initial state probabilities. Must sum to 1
`probObs`	time-independent matrix (`dDHMM` and `rDHMM`) or time-dependent 3D array (`dDHMMo` and `rDHMMo`) of observation probabilities. First two dimensions of `probObs` are of size x (number of possible system states) x (number of possible observation classes). `dDHMMo` and `rDHMMo` expect an additional third dimension of size (number of observation times). probObs[i, j (,t)] is the probability that an individual in the ith latent state is recorded as being in the jth detection state (at time t). See Details for more information.
`probTrans`	time-dependent array of system state transition probabilities. Dimension of `probTrans` is (number of possible system states) x (number of possible system states) x (number of observation times). probTrans[i,j,t] is the probability that an individual truly in state i at time t will be in state j at time t+1. See Details for more information.
`len`	length of observations (needed for rDHMM)
`checkRowSums`	should validity of `probObs` and `probTrans` be checked? Both of these are required to have each set of probabilities sum to 1 (over each row, or second dimension). If `checkRowSums` is non-zero (or `TRUE`), these conditions will be checked within a tolerance of 1e-6. If it is 0 (or `FALSE`), they will not be checked. Not checking should result in faster execution, but whether that is appreciable will be case-specific.
`log`	`TRUE` or 1 to return log probability. `FALSE` or 0 to return probability
`n`	number of random draws, each returning a vector of length `len`. Currently only `n = 1` is supported, but the argument exists for standardization of "`r`" functions

Details

These nimbleFunctions provide distributions that can be used directly in R or in nimble hierarchical models (via nimbleCode and nimbleModel).

The probability (or likelihood) of observation x[t, o] depends on the previous true latent state, the time-dependent probability of transitioning to a new state probTrans, and the probability of observation states given the true latent state probObs.

The distribution has two forms, dDHMM and dDHMMo. dDHMM takes a time-independent observation probability matrix with dimension S x O, while dDHMMo expects a three-dimensional array of time-dependent observation probabilities with dimension S x O x T, where O is the number of possible occupancy states, S is the number of true latent states, and T is the number of time intervals.

probTrans has dimension S x S x (T - 1). probTrans[i, j, t] is the probability that an individual in state i at time t takes on state j at time t+1. The length of the third dimension may be greater than (T - 1) but all values indexed greater than T - 1 will be ignored.

init has length S. init[i] is the probability of being in state i at the first observation time. That means that the first observations arise from the initial state probabilities.

For more explanation, see package vignette (vignette("Introduction_to_nimbleEcology")).

Compared to writing nimble models with a discrete true latent state and a separate scalar datum for each observation, use of these distributions allows one to directly sum (marginalize) over the discrete latent state and calculate the probability of all observations from one site jointly.

These are nimbleFunctions written in the format of user-defined distributions for NIMBLE's extension of the BUGS model language. More information can be found in the NIMBLE User Manual at https://r-nimble.org.

When using these distributions in a nimble model, the left-hand side will be used as x, and the user should not provide the log argument.

For example, in a NIMBLE model,

observedStates[1:T] ~ dDHMM(initStates[1:S], observationProbs[1:S, 1:O], transitionProbs[1:S, 1:S, 1:(T-1)], 1, T)

declares that the observedStates[1:T] vector follows a dynamic hidden Markov model distribution with parameters as indicated, assuming all the parameters have been declared elsewhere in the model. In this case, S is the number of system states, O is the number of observation classes, and T is the number of observation occasions.This will invoke (something like) the following call to dDHMM when nimble uses the model such as for MCMC:

rDHMM(observedStates[1:T], initStates[1:S], observationProbs[1:S, 1:O], transitionProbs[1:S, 1:S, 1:(T-1)], 1, T, log = TRUE)

If an algorithm using a nimble model with this declaration needs to generate a random draw for observedStates[1:T], it will make a similar invocation of rDHMM, with n = 1.

If the observation probabilities are time-dependent, one would use:

observedStates[1:T] ~ dDHMMo(initStates[1:S], observationProbs[1:S, 1:O, 1:T], transitionProbs[1:S, 1:S, 1:(T-1)], 1, T)

The dDHMM[o] distributions should work for models and algorithms that use nimble's automatic differentiation (AD) system. In that system, some kinds of values are "baked in" (cannot be changed) to the AD calculations from the first call, unless and until the AD calculations are reset. For the dDHMM[o] distributions, the sizes of the inputs and the data (x) values themselves are baked in. These can be different for different iterations through a for loop (or nimble model declarations with different indices, for example), but the sizes and data values for each specific iteration will be "baked in" after the first call. In other words, it is assumed that x are data and are not going to change.

Value

For dDHMM and dDHMMo: the probability (or likelihood) or log probability of observation vector x. For rDHMM and rDHMMo: a simulated detection history, x.

Author(s)

Perry de Valpine, Daniel Turek, and Ben Goldstein

References

D. Turek, P. de Valpine and C. J. Paciorek. 2016. Efficient Markov chain Monte Carlo sampling for hierarchical hidden Markov models. Environmental and Ecological Statistics 23:549–564. DOI 10.1007/s10651-016-0353-z

Examples

# Set up constants and initial values for defining the model
dat <- c(1,2,1,1) # A vector of observations
init <- c(0.4, 0.2, 0.4) # A vector of initial state probabilities
probObs <- t(array( # A matrix of observation probabilities
       c(1, 0,
         0, 1,
         0.8, 0.2), c(2, 3)))

probTrans <- array(rep(1/3, 27), # A matrix of time-indexed transition probabilities
            c(3,3,3))

# Define code for a nimbleModel
 nc <- nimbleCode({
   x[1:4] ~ dDHMM(init[1:3], probObs = probObs[1:3, 1:2],
                  probTrans = probTrans[1:3, 1:3, 1:3], len = 4, checkRowSums = 1)

   for (i in 1:3) {
     init[i] ~ dunif(0,1)

     for (j in 1:3) {
       for (t in 1:3) {
         probTrans[i,j,t] ~ dunif(0,1)
       }
     }

     probObs[i, 1] ~ dunif(0,1)
     probObs[i, 2] <- 1 - probObs[i,1]
   }
 })

# Build the model, providing data and initial values
DHMM_model <- nimbleModel(nc,
                          data = list(x = dat),
                          inits = list(init = init,
                                       probObs = probObs,
                                       probTrans = probTrans))
# Calculate log probability of x from the model
DHMM_model$calculate()
# Use the model for a variety of other purposes...

nimble-dev/nimbleEcology documentation built on June 10, 2025, 1:01 p.m.