knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Covariates can be used to extend standard bage models in two ways:
This vignette gives a brief description of covariates in bage and then illustrates their use with a case study of births in South Korea.
A model in bage typically has a structure that is something like: \begin{align} y_{ast} & \sim \text{Poisson}(\gamma_{ast} w_{ast}) \ \log \gamma_{ast} & \sim \text{Gamma}(\xi^{-1}, (\mu_{ast} \xi)^{-1}) \ \log \mu_{ast} & = \beta^{(0)} + \beta_{as}^{\text{age:sex}} + \beta_t^{\text{time}} (#eq:priormod) \end{align} where
In a model with covariates, \@ref(eq:priormod) changes to \begin{equation} \log \mu_{ast} = \beta^{(0)} + \beta_{as}^{\text{age:sex}} + \beta_t^{\text{time}} + (\pmb{Z} \pmb{\zeta})_{ast} \end{equation}
where
Change \@ref(eq:prior-mod-no-cov) to \begin{equation} \mu_i = \sum_{m=0}^M \beta_{j_i^m}^{(m)} + (\pmb{Z} \pmb{\eta})_i (#eq:prior-mod-no-cov) \end{equation}
where - $\pmb{Z}$ is an $I \times P$ matrix of covariates; and - $\pmb{\eta}$ is a vector of coefficients.
The covariate matrix $\pmb{Z}$ is derived from the raw covariate data by scaling any numeric variables to have mean 0 and standard deviation 1, and by converting any categorical variables to sets of indicator variables. The conversion to indicator variables follows the rules that R uses for "treatment" contrasts. If the categorical has $C$ categories, then $C-1$ indicator variabls are constructed, with the first category being omitted.
Each element of $\pmb{\eta}$ has prior \begin{equation} \eta_p \sim \text{N}(0, 1) \end{equation}
To illustrate the use of covariates, we will analyse data on births in South Korea.
Besides bage itself, we use dplyr and vctrs for data manipulation, and ggplot2 for plotting results.
suppressPackageStartupMessages({ library(bage) library(dplyr) library(ggplot2) })
Our data is the data frame kor_births
, which is part of bage. The data frame gives numbers of births disaggregated by age of mother, region, and year. It also contains a numeric variable called gdp_pc_2023
that gives GDP per capita (in thousands of US dollars) in 2023, and a categorical variable (with levels "Low"
, "Medium"
, and "High"
) describing population density in 2020.
kor_births
The variables gdp_pc_2023
and dens_2020
are both examples of covariates that contribute extra information to the model, beyond what is contained in the outcome, exposure, and classifying variables.
We use function set_covariates()
to instruct mod_pois()
to treat these variables as covariates.
mod_gdp_dens <- mod_pois(births ~ (age + region + time)^2, data = kor_births, exposure = popn) |> set_covariates(~ gdp_pc_2023 + dens_2020) |> fit() mod_gdp_dens
To obtain estimates of the coefficients (ie estimates of the $\zeta_p$) we call function components()
and filter out rows for the "covariates"
term.
mod_gdp_dens |> components() |> filter(term == "covariates")
In East Asia, years with the Dragon zodiac sign sometimes have larger-than-usual numbers of births. To allow for this possibility, we create a dragon-year covariate, and incorporate it into a new model.
births <- kor_births |> mutate(is_dragon_year = time == 2012) mod_dragon <- mod_pois(births ~ (age + region + time)^2, data = births, exposure = popn) |> set_covariates(~ is_dragon_year) |> fit() mod_dragon |> components() |> filter(term == "covariates")
There is some evidence for extra births, though there is substantial uncertainty about the size of the effect.
Next we expand the model to allow the dragon-year effect to differ across age groups. We create a variable that takes the values "baseline"
in all years, except in dragon years, when it takes the name of the age group. We turn this variable into a factor with "baseline"
as its first level.
births <- births |> mutate(is_dragon_year_age = if_else(time == 2012, age, "baseline"), is_dragon_year_age = factor(is_dragon_year_age, levels = c("baseline", unique(age)))) births |> filter(time %in% 2011:2013)
We create a new model with the age-sepcific dragon-year indicator.
mod_dragon_age <- mod_pois(births ~ (age + region + time)^2, data = births, exposure = popn) |> set_covariates(~ is_dragon_year_age) |> fit() mod_dragon_age
Rather than a single dragon-year coefficient, we have a coefficient for each age group. We extract them and tidy up the labels.
mod_dragon_age |> components() |> filter(term == "covariates") |> mutate(age = sub("is_dragon_year_age", "", level)) |> select(age, .fitted)
If all the covariates in a model are fixed, then the model can be forecasted as normal.
mod_gdp_dens |> forecast(labels = 2024:2025)
If, however, a covariate varies over time, forecasting only works if values for future periods are provided. The following code will result in an error:
mod_dragon |> forecast(labels = 2024:2025)
Instead we need to create a newdata
data frame...
newdata <- expand.grid(age = unique(kor_births$age), region = unique(kor_births$region), time = 2024:2025) |> mutate(is_dragon_year = FALSE) head(newdata)
...and supply it to forecast()
.
mod_dragon |> forecast(newdata = newdata)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.