replicate_data: Create Replicate Data

View source: R/bage_mod-methods.R

replicate_dataR Documentation

Create Replicate Data

Description

Use a fitted model to create replicate datasets, typically as a way of checking a model.

Usage

replicate_data(x, condition_on = NULL, n = 19)

Arguments

x

A fitted model, typically created by calling mod_pois(), mod_binom(), or mod_norm(), and then fit().

condition_on

Parameters to condition on. Either "expected" or "fitted". See details.

n

Number of replicate datasets to create. Default is 19.

Details

Use n draws from the posterior distribution for model parameters to generate n simulated datasets. If the model is working well, these simulated datasets should look similar to the actual dataset.

Value

A tibble with the following structure:

.replicate data
"Original" Original data supplied to mod_pois(), mod_binom(), mod_norm()
"Replicate 1" Simulated data.
"Replicate 2" Simulated data.
... ...
"Replicate <n>" Simulated data.

The condition_on argument

With Poisson and binomial models that include dispersion terms (which is the default), there are two options for constructing replicate data.

  • When condition_on is "fitted", the replicate data is created by (i) drawing values from the posterior distribution for rates or probabilities (the \gamma_i defined in mod_pois() and mod_binom()), and (ii) conditional on these rates or probabilities, drawing values for the outcome variable.

  • When condition_on is "expected", the replicate data is created by (i) drawing values from hyper-parameters governing the rates or probabilities (the \mu_i and \xi defined in mod_pois() and mod_binom()), then (ii) conditional on these hyper-parameters, drawing values for the rates or probabilities, and finally (iii) conditional on these rates or probabilities, drawing values for the outcome variable. The '"expected" option is only possible in Poisson and binomial models, and only when dispersion is non-zero.

The default for condition_on is "expected", in cases where it is feasible. The "expected" option provides a more severe test for a model than the "fitted" option, since "fitted" values are weighted averages of the "expected" values and the original data.

Data models for outcomes

If a data model has been provided for the outcome variable, then creation of replicate data will include a step where errors are added to outcomes. For instance, the a rr3 data model is used, then replicate_data() rounds the outcomes to base 3.

See Also

  • mod_pois() Specify a Poisson model

  • mod_binom() Specify a binomial model

  • mod_norm() Specify a normal model

  • fit() Fit model.

  • augment() Extract values for rates, probabilities, or means, together with original data

  • components() Extract values for hyper-parameters

  • forecast() Forecast, based on a model

  • report_sim() Simulation study of model.

  • Mathematical Details vignette

Examples

mod <- mod_pois(injuries ~ age:sex + ethnicity + year,
                data = nzl_injuries,
                exposure = 1) |>
  fit()

rep_data <- mod |>
  replicate_data()

library(dplyr)
rep_data |>
  group_by(.replicate) |>
  count(wt = injuries)

## when the overall model includes an rr3 data model,
## replicate data are rounded to base 3
mod_pois(injuries ~ age:sex + ethnicity + year,
         data = nzl_injuries,
         exposure = popn) |>
  set_datamod_outcome_rr3() |>
  fit() |>
  replicate_data()

bage documentation built on Nov. 5, 2025, 5:33 p.m.