knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  cache = TRUE
)
#' @srrstats {G1.2} Contains project status badge.
#' @srrstats {G1.4,G1.4a} Package uses roxygen2 for documentation.
#' @srrstats {G2.0, G2.0a, G2.1, G2.1a, G2.2, G2.4, G2.4a, G2.4b, G2.4c, G2.6} 
#' Input types and shapes are tested and checked with autotest and converted 
#' explicitly when necessary.
#' 
#' @srrstats {G2.3, G2.3a, G2.3b} match.arg and tolower are used where 
#' applicable.
#' @srrstats {G1.0, G1.3, G1.4, G1.4a, G1.5, G1.6} General 
#' documentation, addressed by the vignettes and the corresponding R 
#' Journal paper.
#' @srrstats {G1.1} This is the first software to implement the IS-MCMC by 
#' Vihola, Helske, and Franks (2020) and first R package to implement delayed 
#' acceptance pseudo-marginal MCMC for state space models. The IS-MCMC method 
#' is also available in [walker](github.com/helske/walker) package for a 
#' limited class of time-varying GLMss (a small subset of the models 
#' supported by this package). Some of the functionality for exponential family 
#' state space models is also available in [KFAS](github.com/helske/KFAS), and 
#' those models can be converted easily to bssm format for Bayesian analysis.
#' @srrstats {G2.4, G2.4a, G2.4b, G2.4c, G2.6} Explicit conversions are used 
#' where necessary.
#' 
#' @srrstats {G2.14, G2.14a, G2.14b, G2.14c, G2.15, G2.16} Missing observations 
#' (y) are handled automatically as per SSM theory, whereas missing values are 
#' not allowed elsewhere. Inputing or ignoring them does not make sense in time 
#' series context.
#'
#' @srrstats {G3.0} No floating point equality comparisons are made.
#'
#' @srrstats {G5.4, G5.4a, G5.4b, G5.4c, G5.5, G5.6, G5.6a, G5.6b, G5.7} and
#' @srrstats {BS4.0, BS4.1} The algorithms work as defined per Vihola, Helske, 
#' Franks (2020) (all simulations were implemented with the bssm package) and 
#' Helske and Vihola (2021). Full replication of the results would take 
#' days/weeks (but see also bsm_ng, negbin_series and several testthat tests).
#'
#' @srrstats {G5.8, G5.8a, G5.8b, G5.8c, G5.8d} Tested with autotest and the 
#' testthat tests.
#' @srrstats {G5.9, G5.9a, G5.9b} Tested with autotest and the testthat tests.
#'
#' @srrstats {BS1.0, BS1.1, BS1.2, BS1.2a, BS1.2b, BS1.3b} Addressed in the 
#' models.R, run_mcmc.R, in vignettes and in the R Journal paper.
#'
#' @srrstats {BS2.1, BS2.1a, BS2.6} Tested and demonstrated by autotest and 
#' package examples/tests.
#' @srrstats {BS7.4, BS7.4a} The scales do not matter (in terms of runtime) 
#' in random walk Metropolis nor in particle filters, as long as numerical 
#' issues are not encountered

bssm

Project Status: Active - The project has reached a stable, usable state and is being actively developed Status at rOpenSci Software Peer Review R-CMD-check Codecov test coverage CRAN version downloads

The bssm R package provides efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo and importance sampling type weighted MCMC. Currently Gaussian, Poisson, binomial, negative binomial, and Gamma observation densities with linear-Gaussian state dynamics, as well as general non-linear Gaussian models and discretely observed latent diffusion processes are supported.

For details, see

There are also couple posters and a talk related to IS-correction methodology and bssm package:

The bssm package was originally developed with the support of Academy of Finland grants 284513, 312605, 311877, and 331817. Current development is focused on increased usability. For recent changes, see NEWS file.

Citing the package

If you use the bssm package in publications, please cite the corresponding R Journal paper:

Jouni Helske and Matti Vihola (2021). "bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R." The R Journal (2021) 13:2, pages 578-589. https://journal.r-project.org/archive/2021/RJ-2021-103/index.html

Installation

You can install the released version of bssm from CRAN with:

install.packages("bssm")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("helske/bssm")

Or from R-universe with

install.packages("bssm", repos = "https://helske.r-universe.dev")

Example

Consider the daily air quality measurements in New Your from May to September 1973, available in the datasets package. Let's try to predict the missing ozone levels by simple linear-Gaussian local linear trend model with temperature and wind as explanatory variables (missing response variables are handled naturally in the state space modelling framework, however no missing values in covariates are normally allowed);

library("bssm")
library("dplyr")
library("ggplot2")
set.seed(1)

data("airquality", package = "datasets")

# Covariates as matrix. For complex cases, check out as_bssm function
xreg <- airquality |> select(Wind, Temp) |> as.matrix()

model <- bsm_lg(airquality$Ozone,
  xreg = xreg,  
  # Define priors for hyperparameters (i.e. not the states), see ?bssm_prior
  # Initial value followed by parameters of the prior distribution
  beta = normal_prior(rep(0, ncol(xreg)), 0, 1),
  sd_y = gamma_prior(1, 2, 0.01),
  sd_level = gamma_prior(1, 2, 0.01), 
  sd_slope = gamma_prior(1, 2, 0.01))

fit <- run_mcmc(model, iter = 20000, burnin = 5000)
fit

obs <- data.frame(Time = 1:nrow(airquality),
  Ozone = airquality$Ozone) |> filter(!is.na(Ozone))

pred <- fitted(fit, model)
pred |>
  ggplot(aes(x = Time, y = Mean)) + 
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), 
    alpha = 0.5, fill = "steelblue") + 
  geom_line() + 
  geom_point(data = obs, 
    aes(x = Time, y = Ozone), colour = "Tomato") +
  theme_bw()

Same model but now assuming observations are from Gamma distribution:

model2 <- bsm_ng(airquality$Ozone,
  xreg = xreg,  
  beta = normal(rep(0, ncol(xreg)), 0, 1),
  distribution = "gamma",
  phi = gamma_prior(1, 2, 0.01),
  sd_level = gamma_prior(1, 2, 0.1), 
  sd_slope = gamma_prior(1, 2, 0.1))

fit2 <- run_mcmc(model2, iter = 20000, burnin = 5000, particles = 10)
fit2

Comparison:

pred2 <- fitted(fit2, model2)

bind_rows(list(Gaussian = pred, Gamma = pred2), .id = "Model") |>
  ggplot(aes(x = Time, y = Mean)) + 
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`, fill = Model), 
    alpha = 0.25) + 
  geom_line(aes(colour = Model)) + 
  geom_point(data = obs, 
    aes(x = Time, y = Ozone)) +
  theme_bw()

Now let's assume that we also want to use the solar radiation variable as predictor for ozone. As it contains few missing values, we cannot use it directly. As the number of missing time points is very small, simple imputation would likely be acceptable, but let's consider more another approach. For simplicity, the slope terms of the previous models are now omitted, and we focus on the Gaussian case. Let $\mu_t$ be the true solar radiation at time $t$. Now for ozone $O_t$ we assume following model:

$O_t = D_t + \alpha_t + \beta_S \mu_t + \sigma_\epsilon \epsilon_t$\ $\alpha_{t+1} = \alpha_t + \sigma_\eta\eta_t$\ $\alpha_1 \sim N(0, 100^2\textrm{I})$,\ wheere $D_t = \beta X_t$ contains regression terms related to wind and temperature, $\alpha_t$ is the time varying intercept term, and $\beta_S$ is the effect of solar radiation $\mu_t$.

Now for the observed solar radiation $S_t$ we assume

$S_t = \mu_t$\ $\mu_{t+1} = \mu_t + \sigma_\xi\xi_t,$\ $\mu_1 \sim N(0, 100^2)$,\ i.e. we assume as simple random walk for the $\mu$ which we observe without error or not at all (there is no error term in the observation equation $S_t=\mu_t$).

We combine these two models as a bivariate Gaussian model with ssm_mlg:

# predictors (not including solar radiation) for ozone
xreg <- airquality |> select(Wind, Temp) |> as.matrix()

# Function which outputs new model components given the parameter vector theta
update_fn <- function(theta) {
  D <- rbind(t(xreg %*% theta[1:2]), 1)
  Z <- matrix(c(1, 0, theta[3], 1), 2, 2)
  R <- diag(exp(theta[4:5]))
  H <- diag(c(exp(theta[6]), 0))
  # add third dimension so we have p x n x 1, p x m x 1, p x p x 1 arrays
  dim(Z)[3] <- dim(R)[3] <- dim(H)[3] <- 1
  list(D = D, Z = Z, R = R, H = H)
}

# Function for log-prior density
prior_fn <- function(theta) {
  sum(dnorm(theta[1:3], 0, 10, log = TRUE)) + 
    sum(dgamma(exp(theta[4:6]), 2, 0.01, log = TRUE)) + 
    sum(theta[4:6]) # log-jacobian
}

init_theta <- c(0, 0, 0, log(50), log(5), log(20))
comps <- update_fn(init_theta)

model <- ssm_mlg(y = cbind(Ozone = airquality$Ozone, Solar = airquality$Solar.R),
  Z = comps$Z, D = comps$D, H = comps$H, T = diag(2), R = comps$R, 
  a1 = rep(0, 2), P1 = diag(100, 2), init_theta = init_theta, 
  state_names = c("alpha", "mu"), update_fn = update_fn,
  prior_fn = prior_fn)

fit <- run_mcmc(model, iter = 60000, burnin = 10000)
fit

Draw predictions:

pred <- fitted(fit, model)

obs <- data.frame(Time = 1:nrow(airquality),
  Solar = airquality$Solar.R) |> filter(!is.na(Solar))

pred |> filter(Variable == "Solar") |>
  ggplot(aes(x = Time, y = Mean)) + 
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), 
    alpha = 0.5, fill = "steelblue") + 
  geom_line() + 
  geom_point(data = obs, 
    aes(x = Time, y = Solar), colour = "Tomato") +
  theme_bw()


obs <- data.frame(Time = 1:nrow(airquality),
  Ozone = airquality$Ozone) |> filter(!is.na(Ozone))

pred |> filter(Variable == "Ozone") |>
  ggplot(aes(x = Time, y = Mean)) + 
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), 
    alpha = 0.5, fill = "steelblue") + 
  geom_line() +  
  geom_point(data = obs, 
    aes(x = Time, y = Ozone), colour = "Tomato") +
  theme_bw()

See more examples in the paper, vignettes, and in the docs.



helske/bssm documentation built on Oct. 29, 2023, 6:04 a.m.