In njtierney/yahtsee: Yet Another Hierachical Time Series Extension and Expansion

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(yahtsee)
library(ggplot2)
library(dplyr)

The data

For this model we are going to fit a very simple model using some summarised data from the malariaAtlas R package, malaria_africa_ts:

malaria_africa_ts

This data has the following features we are interested in:

hierarchical structure (countries within who subregion, within region)
Positive rate measurements recorded over time for these areas (positive rate given by the number of positive cases divided by the number of cases examined).

(Note that we are still getting some more covariates for this data.)

`tsibble` data

This data is a tsibble (the "ts" is pronounced like the end of "bats").

This is a speical time series aware table, that knows what elements identify the individual time components, in this case, country, and what the time index is, in this case, date.

We use a tsibble because it stores this time (referred to as an "index") and group (referred to as a "key") information, which we can use inside our modelling software.

library(tsibble)
key(malaria_africa_ts)
index(malaria_africa_ts)

Modelling

Let's say we want to model the pr over time. Here is a plot of the pr over time, where each line is a country, and facets represent the different who subregions:

ggplot(malaria_africa_ts,
       aes(x = date,
           y = pr,
           group = country,
           colour = who_subregion)) + 
  geom_line() +
  facet_wrap(~who_subregion) + 
  theme(legend.position = "none")

Let's create a simple model that has fixed effect of lower age. We add a AR1 process for each of these subregions using the hts() component in the formula. Here, the inputs arethe levels of hierarchy, in order of decreasing size:

pr ~ avg_lower_age + hts(who_region, who_subregion, country)

We then provide the data, likelihood family (in this case "gaussian", but all INLA likelihoods are available).

We specify the time component at the moment using the special_index argument, but this will be removed later once we resolve a couple of bugs to do with the data.

m <- fit_hts(
    # inputs are  the levels of hierarchy, in order of decreasing size
  formula = pr ~ avg_lower_age + hts(who_region, who_subregion, country),
  .data = malaria_africa_ts,
  family = "gaussian",
  special_index = month_num
)

The equivalent model fitted with inlabru would look like the following:

inlabru::bru(
formula = pr ~ avg_lower_age + Intercept + 
  who_region(month_num, 
             model = "ar1", 
             group = .who_region_id,
             constr = FALSE) + 
  who_subregion(month_num, 
                model = "ar1", 
                group = .who_subregion_id, 
                constr = FALSE) + 
  country(month_num, 
          model = "ar1", 
          group = .country_id, 
          constr = FALSE),
    family = "gaussian",
    data = malaria_africa_ts,
    options = list(
      control.compute = list(config = TRUE),
      control.predictor = list(compute = TRUE, link = 1)
      )
      )

Here are some of the extra considerations that need to be made:

What to name the random effects (who_region, who_subregion, country)
Specifying the time component of the ar1 process
repeating the "ar1" component for each random effect
The group argument requires a special index variable of a group to be made (.who_subregion_id)
Additional options passed to inlabru in options to help get the appropriate data back.

Proposed workflow

The proposed workflow of this type of model is as follows:

Specify data as a tsibble object:

library(tsibble)
malaria_africa_ts <- as_tsibble(x = malaria_africa,
                                key = country,
                                index = date)

Fit the model, parsing the hierarchy structure

model_hts <- fit_hts(
    # inputs are  the levels of hierarchy, in order of decreasing size
  formula = pr ~ avg_lower_age + hts(who_region, who_subregion, country),
  .data = malaria_africa_ts,
  family = "gaussian"
)

Perform diagnostics

diagnostics(model_hts)
autoplot(model_hts)

Make prediction data

date_range <- clock::date_build(2019, 2, 1:5)
date_range
countries <- c("Ethiopia", "Tanzania")
countries
df_pred <- prediction_data(
  model_data = malaria_africa_ts,
  key = countries,
  index = date_range
)

df_pred

Post hoc prediction from the model to add uncertainty information

# post-hoc prediction function - with options for presenting the uncertainty
pred <- predict(m, df_pred, se.fit = TRUE)

njtierney/yahtsee documentation built on Feb. 5, 2022, 8:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

njtierney/yahtsee
Yet Another Hierachical Time Series Extension and Expansion

In njtierney/yahtsee: Yet Another Hierachical Time Series Extension and Expansion

The data

`tsibble` data

Modelling

Proposed workflow

R Package Documentation

Browse R Packages

We want your feedback!

njtierney/yahtsee Yet Another Hierachical Time Series Extension and Expansion

In njtierney/yahtsee: Yet Another Hierachical Time Series Extension and Expansion

The data

tsibble data

Modelling

Proposed workflow

R Package Documentation

Browse R Packages

We want your feedback!

njtierney/yahtsee
Yet Another Hierachical Time Series Extension and Expansion

`tsibble` data