set_datamod_exposure: Specify Exposure Data Model
In bage: Bayesian Estimation and Forecasting of Age-Specific Rates

set_datamod_exposure

R Documentation

Specify Exposure Data Model

Description

Specify a data model for the exposure variable in a Poisson model. The data model assumes that, within each cell, observed exposure is drawn from an Inverse-Gamma distribution. In this model,

E[ expected exposure | true exposure ] = true exposure

and

sd[ expected exposure | true exposure ] = cv \times true exposure

where cv is a coefficient of variation parameter.

Usage

set_datamod_exposure(mod, cv)

Arguments

`mod`	An object of class `"bage_mod_pois"`, created with `mod_pois()`.
`cv`	Coefficient of variation for measurement errors in exposure. A single number, or a data frame with a variable called `"cv"` and one or more 'by' variables.

Details

In the exposure data model, cv, the coefficient of variation, does not depend on true exposure. This implies that errors do not fall, in relative terms, as population rises. Unlike sampling errors, measurement errors do not get averaged away in large populations.

The exposure data model assumes that the exposure variable is unbiased. If there is in fact evidence of biases, then this evidence should be used to create a de-biased version of the variable (eg one where estimated biases have been subtracted) to supply to mod_pois().

set_datamod_exposure() can only be used with a Poisson model for rates in which the dispersion in the rates has been set to zero. The dispersion in the rates can be set explicitly to zero using set_disp(), though set_datamod_exposure() will also do so.

Value

A revised version of mod.

The `cv` argument

cv can be a single number, in which case the same value is used for all cells. cv can also be a data frame with a with a variable called "cv" and one or more columns with 'by' variables. For instance, a cv of

data.frame(sex = c("Female", "Male"),
           cv = c(0.01, 0.012))

implies that the coefficient of variation is 0.01 for females and 0.012 for males.

See below for an example where the coefficient of variation is based on aggregated age groups.

Mathematical details

The model for observed exposure is

w_i^{\text{obs}} \sim \text{InvGamma}(2 + d_{g \lbrack i \rbrack }^{-1}, (1 + d_{g \lbrack i\rbrack }^{-1}) w_i^{\text{true}})

where

w_i^{\text{obs}} is observed exposure for cell i (the exposure argument to mod_pois());
w_i^{\text{true}} is true exposure for cell i; and
d_{g\lbrack i\rbrack } is the value for dispersion that is applied to cell i.

cv is \sqrt{d_g}.

Examples

## specify model
mod <- mod_pois(injuries ~ age * sex + year,
                data = nzl_injuries,
                exposure = popn) |>
  set_disp(mean = 0) |>
  set_datamod_exposure(cv = 0.025)

## fit the model
mod <- mod |>
  fit()
mod

## examine results - note the new variable
## '.popn' with estimates of the true
## population
aug <- mod |>
  augment()

## allow different cv's for each sex
cv_sex <- data.frame(sex = c("Female", "Male"),
                     cv = c(0.03, 0.02))
mod <- mod |>
  set_datamod_exposure(cv = cv_sex)
mod

## our outcome variable is confidentialized,
## so we recognize that in the model too
mod <- mod |>
  set_confidential_rr3()
mod

## now a model where everyone aged 0-49
## receives one value for cv, and
## everyone aged 50+ receives another
library(poputils) ## for 'age_upper()'
library(dplyr, warn.conflicts = FALSE)
nzl_injuries_age <- nzl_injuries |>
  mutate(age_group = if_else(age_upper(age) < 50,
                             "0-49",
                             "50+"))
cv_age <- data.frame(age_group = c("0-49", "50+"),
                     cv = c(0.05, 0.01))
mod <- mod_pois(injuries ~ age * sex + year,
                data = nzl_injuries_age,
                exposure = popn) |>
  set_disp(mean = 0) |>
  set_datamod_exposure(cv = cv_age)

bage documentation built on Nov. 19, 2025, 9:07 a.m.