Non-linear models with bssm

Sys.setenv("OMP_NUM_THREADS" = 2) # For CRAN
if (!requireNamespace("rmarkdown") ||
    !rmarkdown::pandoc_available("1.12.3")) {
  warning(call. = FALSE, "These vignettes assume rmarkdown and pandoc version 1.12.3. These were not found. Older versions will not work.")
  knitr::knit_exit()
}
knitr::opts_chunk$set(echo = TRUE)

Introduction

This vignette shows how to model general non-linear state space models with bssm. The general non-linear Gaussian model in bssm has following form:

$$ y_t = Z(t, \alpha_t, \theta) + H(t, \alpha_t, \theta)\epsilon_t,\ \alpha_{t+1} = T(t, \alpha_t, \theta) + R(t, \alpha_t, \theta)\eta_t,\ \alpha_1 \sim N(a_1(\theta), P_1(\theta)), $$ with $t=1,\ldots, n$, $\epsilon_t \sim N(0,\textrm{I}_p)$, and $\eta \sim N(0,\textrm{I}_k)$. Here vector $\theta$ contains the unknown model parameters.

As some of the model matrices may depend on the current state $\alpha_t$, constructing for example $T(t,\alpha_t,\theta)$ by calling user-defined R function is not feasible, as this should be done repeatedly within the particle filter which would negate the benefits of the whole C++ implementation of the particle filter and Markov chain Monte Carlo. Therefore the functions $T(\cdot)$, $H(\cdot)$, $T(\cdot)$, $R(\cdot)$,$a_1(\cdot)$, $P_1(\cdot)$, as well as functions defining the Jacobians of $Z(\cdot)$ and $T(\cdot)$ and the prior distribution for $\theta$ must be defined by user as a external pointers to C++ functions. For the log-density of theta, we can call R's own C-level density functions, for example `R::dnorm`` (see the template for an example).

As an example, a logistic growth model of form $$ y_t = p_t + \epsilon_t,\ p_{t+1} = K p_t \frac{\exp(r_t dt)}{K + p_t (\exp(r_tdt ) - 1)} + \xi_t,\ r_t = \frac{\exp{r't}}{1 + \exp{r'_t}},\ r'{t+1} = r'_t + \eta_t, $$ with constant carrying capacity $K = 500$, initial population size $p_1 = 50$, initial growth rate on logit scale $r'_1 = -1.5$, $dt = 0.1$, $\xi \sim N(0,1)$, $\eta \sim N(0,0.05)$, and $\epsilon \sim N(0, 1)$.

Let's first simulate some data, with $\sigma_r=\sigma_p=0$:

set.seed(1)

p1 <- 50 # population size at t = 1
K <- 500 # carrying capacity
H <- 1 # standard deviation of obs noise
R_1 <- 0.05 # standard deviation of the noise on logit-growth
R_2 <- 1 # standard deviation of the noise in population level
#sample time
dT <- .1

#observation times
t <- seq(0.1, 30, dT)
n <- length(t)
r <- plogis(cumsum(c(-1.5, rnorm(n - 1, sd = R_1))))
p <- numeric(n)
p[1] <- p1
for(i in 2:n)
  p[i] <- rnorm(1, K * p[i-1] * exp(r[i-1] * dT) / (K + p[i-1] * (exp(r[i-1] * dT) - 1)), R_2)
# observations
y <- p + rnorm(n, 0, H)

Model in bssm

The functions determining the model need to be written in C++. Some example models which can be used as a template are given by the function cpp_example_model which returns pointers usable as an input to nlg_ssm. For this growth model, we could call cpp_example_model("nlg_growth"). In general, you need to define the functions matching the model components, log-density of the prior and few other functions. For example, in case of our model, the function T_fn defines the state transition function $T(\cdot)$:

```{Rcpp, eval = FALSE} // [[Rcpp::export]] arma::vec T_fn(const unsigned int t, const arma::vec& alpha, const arma::vec& theta, const arma::vec& known_params, const arma::mat& known_tv_params) {

double dT = known_params(0); double K = known_params(1);

arma::vec alpha_new(2); alpha_new(0) = alpha(0); double r = exp(alpha(0)) / (1.0 + exp(alpha(0))); alpha_new(1) = K * alpha(1) * exp(r * dT) / (K + alpha(1) * (exp(r * dT) - 1)); return alpha_new; }

The name of this function does not matter, but it should always return Armadillo vector (`arma::vec`), and have the same signature (i.e. the order and types of the function's parameters) should always be like above, even though some of the parameters were not used in the body of the function.  Note that all of these functions can also depend on some known parameters, given as `known_params` (vector) and `known_tv_params` (matrix) arguments to `ssm_nlg` function (which are then passed to individual `C++` snippets). For details of using Armadillo, see [Armadillo documentation](https://arma.sourceforge.net/docs.html). After defining the appropriate model functions, the `cpp` file should also contain a function for creating external pointers for the aforementioned functions. Why this is needed is more technical issue, but fortunately you can just copy the function from the example file without any modifications. 

After creating the file for `C++` functions, you need to compile the file using `Rcpp`^[As repeated calls to compile same `cpp` file can sometimes lead to memory issues, it is good practice to define unique cache directory using the `cacheDir` argument([see issue in Github](https://github.com/helske/crashtest/issues/1)). But the CRAN does not like this approach so we do not use it here.]:

```r
Rcpp::sourceCpp("ssm_nlg_template.cpp")
pntrs <- create_xptrs()

This takes a few seconds. let's define our initial guess for $\theta$, the logarithms of the standard deviations of observational and process level noise, and define the prior distribution for $\alpha_1$ (we use log-scale in sampling for efficiency reasons, but define priors for the standard deviations, see the template file at the appendix):

initial_theta <- c(log_H = 0, log_R1 = log(0.05), log_R2 = 0)

# dT, K, a1 and the prior variances of 1st and 2nd state (logit r and and p)
known_params <- c(dT = dT, K = K, a11 = -1, a12 = 50, P11 = 1, P12 = 100)

If you have used line // [[Rcpp::export]] before the model functions, you can now test that the functions work as intended:

T_fn(0, c(100, 200), initial_theta, known_params, matrix(1))

Now the actual model object using ssm_nlg:

library("bssm")
model <- ssm_nlg(y = y, a1=pntrs$a1, P1 = pntrs$P1, 
  Z = pntrs$Z_fn, H = pntrs$H_fn, T = pntrs$T_fn, R = pntrs$R_fn, 
  Z_gn = pntrs$Z_gn, T_gn = pntrs$T_gn,
  theta = initial_theta, log_prior_pdf = pntrs$log_prior_pdf,
  known_params = known_params, known_tv_params = matrix(1),
  n_states = 2, n_etas = 2, state_names = c("logit_r", "p"))

Let's first run Extended Kalman filter and smoother using our initial guess for $\theta$:

out_filter <- ekf(model)
out_smoother <- ekf_smoother(model)
ts.plot(cbind(y, out_filter$att[, 2], 
  out_smoother$alphahat[, 2]), col = 1:3)
ts.plot(plogis(cbind(out_filter$att[, 1], 
  out_smoother$alphahat[, 1])), col = 1:2)

Markov chain Monte Carlo

For parameter inference, we can perform full Bayesian inference with \texttt{bssm}. There are multiple choices for the MCMC algorithm in the package, and here we will use the default choice, which is an approximate MCMC with $\psi$-APF based importance sampling correction [@vihola-helske-franks]. Let us compare this approach with EKF-based approximate MCMC (the former is unbiased, whereas latter is not). Due to package check requirements in CRAN, we use only small number of iterations:

# Cholesky of initial proposal matrix (taken from previous runs)
# used here to speed up convergence due to the small number of iterations
S <- matrix(c(0.13, 0.13, -0.11, 0, 0.82, -0.04, 0, 0, 0.16), 3, 3) 
mcmc_is <- run_mcmc(model, iter = 6000, burnin = 1000, particles = 10, 
  mcmc_type = "is2", sampling_method = "psi", S = S)
mcmc_ekf <- run_mcmc(model, iter = 6000, burnin = 1000, 
  mcmc_type = "ekf", S = S)
summary(mcmc_is, return_se = TRUE)
summary(mcmc_ekf, return_se = TRUE)

Using the as.data.frame method we can convert the state samples to a data frame for further processing with the dplyr package [@dplyr] (we could do this automatically with summary method as well):

library("dplyr")
library("diagis")
d1 <- as.data.frame(mcmc_is, variable = "states")
d2 <- as.data.frame(mcmc_ekf, variable = "states")
d1$method <- "is2-psi"
d2$method <- "approx ekf"

r_summary <- rbind(d1, d2) |> 
  filter(variable == "logit_r") |>
  group_by(time, method) |>
  summarise(
    mean = weighted_mean(plogis(value), weight), 
    lwr = weighted_quantile(plogis(value), weight, 0.025), 
    upr = weighted_quantile(plogis(value), weight, 0.975))

p_summary <- rbind(d1, d2) |> 
  filter(variable == "p") |>
  group_by(time, method) |>
  summarise(  
    mean = weighted_mean(value, weight), 
    lwr = weighted_quantile(value, weight, 0.025), 
    upr = weighted_quantile(value, weight, 0.975))

Above we used the weighted versions of mean and quantile functions provided by the diagis[@diagis] package as our IS-MCMC algorithm produces weighted samples of the posterior. Alternatively, we could have used argument output_type = "summary", in which case the run_mcmc returns posterior means and covariances of the states instead of samples (these are computed using the full output of particle filter so these estimates are more accurate).

Using ggplot2 [@ggplot2] we can compare our two estimation methods:

library("ggplot2")
ggplot(r_summary, aes(x = time, y = mean)) + 
  geom_ribbon(aes(ymin = lwr, ymax = upr, fill = method), 
    colour = NA, alpha = 0.25) +
  geom_line(aes(colour = method)) +
  geom_line(data = data.frame(mean = r, time = seq_along(r))) +
  theme_bw()

p_summary$cut <- cut(p_summary$time, c(0, 100, 200, 301))
ggplot(p_summary, aes(x = time, y = mean)) + 
  geom_point(data = data.frame(
    mean = y, time = seq_along(y),
    cut = cut(seq_along(y), c(0, 100, 200, 301))), alpha = 0.1) +
  geom_ribbon(aes(ymin = lwr, ymax = upr, fill = method), 
    colour = NA, alpha = 0.25) +
  geom_line(aes(colour = method)) +
  theme_bw() + facet_wrap(~ cut, scales = "free")

In this example, EKF approximation performs well compared to exact method, while being considerably faster:

mcmc_is$time
mcmc_ekf$time

Appendix

This is the full ssm_nlg_template.cpp file (identical with nlg_growth accessible with cpp_example_model):

{Rcpp ssm_nlg_template, code=readLines('ssm_nlg_template.cpp'), eval = FALSE, echo = TRUE}



Try the bssm package in your browser

Any scripts or data that you put into this service are public.

bssm documentation built on Nov. 2, 2023, 6:25 p.m.