toy_data | R Documentation |
A toy dataset generated to illustrate modeling of age, period, and cohort effects, including interactions with education and sex. This data simulates count outcomes (e.g., disease incidence or event counts) as a function of demographic variables using a Poisson process.
data(toy_data)
A data frame with 10000 rows and 7 variables:
Age of individuals, sampled uniformly from 20 to 59.
Calendar year of observation, sampled uniformly from 1990 to 2019.
Factor for education level, with levels 1, 2 and 3.
Factor indicating biological sex, with levels: "male", "female".
Simulated event count, generated from a Poisson distribution.
The true Poisson rate used to generate count
, computed from the log-linear model.
Derived variable indicating year of birth (period - age).
The underlying event rate is modeled on the log scale as a linear combination of age, period, sex, education, and an age-education interaction. The count outcome is drawn from a Poisson distribution with this rate. This dataset is handy for testing APC models.
The true log-rate is computed (for observation n
) as:
\log(\lambda_n)
= \beta_0
+ \beta_{\text{period}}\,\bigl(2020 - \text{period}_n\bigr)
+ \beta_{\text{sex}}\,I(\text{sex}_n = \text{female}) \\[6pt]
\quad
+ \beta_{\text{edu}}\,(\text{edu level}_n)
+ \beta_{\text{edu-age}}\,(\text{age}_n - 20)\,(\text{edu level}_n - 1)\,I(\text{age}_n \le 40) \\[6pt]
\quad
+ \beta_{\text{edu-age}}\,(60 - \text{age}_n)\,(\text{edu level}_n - 1)\,I(\text{age}_n > 40)
where the rate decreases over time (periods), increases with age up to age 40, and decreases after. The coefficients used are:
intercept = 1.0
b_period = 0.02
b_sex = 0.5
(female effect)
b_education_base = 0.5
b_education_age_interaction = 0.015
Simulated data, created using base R and tibble.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.