set_datamod_noise: Specify Noise Data Model

View source: R/bage_mod-functions.R

set_datamod_noiseR Documentation

Specify Noise Data Model

Description

Specify a data model in which

⁠observed outcome = true outcome + error⁠,

where the error has a symmetric distribution with mean 0.

If the true outcome has a normal distribution, then the error has a normal distribution. If the true outcome has a Poisson distribution, then the error has a symmetric Skellam distribution.

Usage

set_datamod_noise(mod, sd)

Arguments

mod

An object of class "bage_mod", created with mod_norm() or mod_pois().

sd

Standard deviation of measurement errors. A single number, or a data frame with 'by' variables.

Details

The model assumes that the outcome variable is unbiased. If there is in fact evidence of biases, then this evidence should be used to create a de-biased version of the outcome variable in data, and this de-biased version should be used by mod_norm() or mod_pois().

If set_datamod_noise() is used with a Poisson model, then the dispersion term for the Poisson rates must be set to zero. This can be done using set_disp(), though set_datamod_noise() will also do so.

Value

A revised version of mod.

The Skellam distribution

The Skellam distribution is confined to integers, but can take positive and negative values.

If

X_1 \sim \text{Poisson}(\mu_1)

X_2 \sim \text{Poisson}(\mu_2)

then

Y = X_1 - X_2

has a \text{Skellam}(\mu_1, \mu_2) distribution. If \mu_1 = \mu_2, then the distribution is symmetric.

The sd argument

sd can be a single number, in which case the same standard deviation is used for all cells. sd can also be a data frame with a with a variable called "sd" and one or more columns with 'by' variables. For instance, a sd of

data.frame(sex = c("Female", "Male"),
           sd = c(330, 240))

implies that measurement errors have standard deviation 330 for females and 240 for males.

Mathematical details

The model for the observed outcome is

y_i^{\text{obs}} = y_i^{\text{true}} + \epsilon_i

with

\epsilon_i \sim \text{N}(0, s_{g[i]}^2)

if y_i^{\text{true}} has a normal distribution, and

\epsilon_i \sim \text{Skellam}(0.5 s_{g[i]}^2, 0.5 s_{g[i]}^2)

if y_i^{\text{true}} has a Poisson distribution, where

  • y_i^{\text{obs}} is the observed outcome for cell i;

  • y_i^{\text{true}} is the true outcome for cell i;

  • \epsilon_i is the measurement error for cell i; and

  • s_{g\lbrack i\rbrack } is the standard deviation of the measurement error for cell i.

See Also

  • mod_norm() Specify a normal model

  • mod_pois() Specify a Poisson model

  • augment() Original data plus estimated values, including estimates of true value for outcome

  • datamods Data models implemented in bage

  • Mathematical Details vignette

Examples

## Normal model ------------------------------

## prepare outcome variable
library(dplyr, warn.conflicts = FALSE)
spend <- nld_expenditure |>
  mutate(log_spend = log(value + 1))

## specify model
mod <- mod_norm(log_spend ~ age * diag + year,
                data = spend,
                weights = 1) |>
  set_datamod_noise(sd = 0.1)

## fit model
mod <- mod |>
  fit()
mod

## create new aggregated diagnositic
## group variable
library(dplyr, warn.conflicts = FALSE)
spend <- spend |>
  mutate(diag_ag = case_when(
    diag == "Neoplasms" ~ diag,
    diag == "Not allocated" ~ diag,
    TRUE ~ "Other"
  ))

## assume size of measurement errors
## varies across these aggregated groups
sd_diag <- data.frame(diag_ag = c("Neoplasms",
                                  "Not allocated",
                                  "Other"),
                      sd = c(0.05, 0.2, 0.1))

## fit model that uses diagnostic-specific
## standard deviations
mod <- mod_norm(log_spend ~ age * diag + year,
                data = spend,
                weights = 1) |>
  set_datamod_noise(sd = sd_diag)


## Poisson model -----------------------------

mod <- mod_pois(deaths ~ month,
                data = usa_deaths,
                exposure = 1) |>
  set_datamod_noise(sd = 200)

bage documentation built on Nov. 5, 2025, 5:33 p.m.