replicate_data: Create Replicate Data
In bage: Bayesian Estimation and Forecasting of Age-Specific Rates

replicate_data

R Documentation

Create Replicate Data

Description

Use a fitted model to create replicate datasets, typically as a way of checking a model.

Usage

replicate_data(x, condition_on = NULL, n = 19)

Arguments

`x`	A fitted model, typically created by calling `mod_pois()`, `mod_binom()`, or `mod_norm()`, and then `fit()`.
`condition_on`	Parameters to condition on. Either `"expected"` or `"fitted"`. See details.
`n`	Number of replicate datasets to create. Default is 19.

Details

Use n draws from the posterior distribution for model parameters to generate n simulated datasets. If the model is working well, these simulated datasets should look similar to the actual dataset.

Value

A tibble with the following structure:

`.replicate`	data
`"Original"`	Original data supplied to `mod_pois()`, `mod_binom()`, `mod_norm()`
`"Replicate 1"`	Simulated data.
`"Replicate 2"`	Simulated data.
...	...
`"Replicate <n>"`	Simulated data.

The `condition_on` argument

With Poisson and binomial models that include dispersion terms (which is the default), there are two options for constructing replicate data.

When condition_on is "fitted", the replicate data is created by (i) drawing values from the posterior distribution for rates or probabilities (the \gamma_i defined in mod_pois() and mod_binom()), and (ii) conditional on these rates or probabilities, drawing values for the outcome variable.
When condition_on is "expected", the replicate data is created by (i) drawing values from hyper-parameters governing the rates or probabilities (the \mu_i and \xi defined in mod_pois() and mod_binom()), then (ii) conditional on these hyper-parameters, drawing values for the rates or probabilities, and finally (iii) conditional on these rates or probabilities, drawing values for the outcome variable.

The default for condition_on is "expected". The "expected" option provides a more severe test for a model than the "fitted" option, since "fitted" values are weighted averages of the "expected" values and the original data.

As described in mod_norm(), normal models have a different structure from Poisson and binomial models, and the distinction between "fitted" and "expected" does not apply.

Data models for outcomes

If a data model has been provided for the outcome variable, then creation of replicate data will include a step where errors are added to outcomes. For instance, the a rr3 data model is used, then replicate_data() rounds the outcomes to base 3.

Examples

mod <- mod_pois(injuries ~ age:sex + ethnicity + year,
                data = nzl_injuries,
                exposure = 1) |>
  fit()

rep_data <- mod |>
  replicate_data()

library(dplyr)
rep_data |>
  group_by(.replicate) |>
  count(wt = injuries)

## when the overall model includes an rr3 data model,
## replicate data are rounded to base 3
mod_pois(injuries ~ age:sex + ethnicity + year,
         data = nzl_injuries,
         exposure = popn) |>
  set_datamod_outcome_rr3() |>
  fit() |>
  replicate_data()

bage documentation built on April 3, 2025, 8:53 p.m.