knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(yahtsee)
library(ggplot2)
library(dplyr)

The data

For this model we are going to fit a very simple model using some summarised data from the malariaAtlas R package, malaria_africa_ts:

malaria_africa_ts

This data has the following features we are interested in:

(Note that we are still getting some more covariates for this data.)

tsibble data

This data is a tsibble (the "ts" is pronounced like the end of "bats").

This is a speical time series aware table, that knows what elements identify the individual time components, in this case, country, and what the time index is, in this case, date.

We use a tsibble because it stores this time (referred to as an "index") and group (referred to as a "key") information, which we can use inside our modelling software.

library(tsibble)
key(malaria_africa_ts)
index(malaria_africa_ts)

Modelling

Let's say we want to model the pr over time. Here is a plot of the pr over time, where each line is a country, and facets represent the different who subregions:

ggplot(malaria_africa_ts,
       aes(x = date,
           y = pr,
           group = country,
           colour = who_subregion)) + 
  geom_line() +
  facet_wrap(~who_subregion) + 
  theme(legend.position = "none")

Let's create a simple model that has fixed effect of lower age. We add a AR1 process for each of these subregions using the hts() component in the formula. Here, the inputs arethe levels of hierarchy, in order of decreasing size:

pr ~ avg_lower_age + hts(who_region, who_subregion, country)

We then provide the data, likelihood family (in this case "gaussian", but all INLA likelihoods are available).

We specify the time component at the moment using the special_index argument, but this will be removed later once we resolve a couple of bugs to do with the data.

m <- fit_hts(
    # inputs are  the levels of hierarchy, in order of decreasing size
  formula = pr ~ avg_lower_age + hts(who_region, who_subregion, country),
  .data = malaria_africa_ts,
  family = "gaussian",
  special_index = month_num
)

The equivalent model fitted with inlabru would look like the following:

inlabru::bru(
formula = pr ~ avg_lower_age + Intercept + 
  who_region(month_num, 
             model = "ar1", 
             group = .who_region_id,
             constr = FALSE) + 
  who_subregion(month_num, 
                model = "ar1", 
                group = .who_subregion_id, 
                constr = FALSE) + 
  country(month_num, 
          model = "ar1", 
          group = .country_id, 
          constr = FALSE),
    family = "gaussian",
    data = malaria_africa_ts,
    options = list(
      control.compute = list(config = TRUE),
      control.predictor = list(compute = TRUE, link = 1)
      )
      )

Here are some of the extra considerations that need to be made:

  1. What to name the random effects (who_region, who_subregion, country)
  2. Specifying the time component of the ar1 process
  3. repeating the "ar1" component for each random effect
  4. The group argument requires a special index variable of a group to be made (.who_subregion_id)
  5. Additional options passed to inlabru in options to help get the appropriate data back.

Proposed workflow

The proposed workflow of this type of model is as follows:

  1. Specify data as a tsibble object:
library(tsibble)
malaria_africa_ts <- as_tsibble(x = malaria_africa,
                                key = country,
                                index = date)
  1. Fit the model, parsing the hierarchy structure
model_hts <- fit_hts(
    # inputs are  the levels of hierarchy, in order of decreasing size
  formula = pr ~ avg_lower_age + hts(who_region, who_subregion, country),
  .data = malaria_africa_ts,
  family = "gaussian"
)
  1. Perform diagnostics
diagnostics(model_hts)
autoplot(model_hts)
  1. Make prediction data
date_range <- clock::date_build(2019, 2, 1:5)
date_range
countries <- c("Ethiopia", "Tanzania")
countries
df_pred <- prediction_data(
  model_data = malaria_africa_ts,
  key = countries,
  index = date_range
)

df_pred
  1. Post hoc prediction from the model to add uncertainty information
# post-hoc prediction function - with options for presenting the uncertainty
pred <- predict(m, df_pred, se.fit = TRUE)


njtierney/yahtsee documentation built on Feb. 5, 2022, 8:25 p.m.