Simulate Individual Data"

knitr::opts_chunk$set(echo = TRUE)
library(ReSurv)
library(ggplot2)

Introduction

In this vignette we show how to simulate the individual data we included in the simulation study of @hiabu23. The simulations are based on the SynthETIC package and they can be used to replicate our results. In the manuscript, we named the $5$ scenarios Alpha, Beta, Gamma, Delta, Epsilon. The $5$ scenarios have the same data features described in the following table. Conversely, they have specific characteristics that we will describe in the coming sections.

| Covariates | Description | |--------------------------------------------------|--------------------| | claim_number | Policy identifier. | | claim_type $\in \left{0, 1 \right}$ | Type of claim. | | AP | Accident month. | | RP | Reporting month. |

For each scenario we will show if they satisfy the chain ladder assumptions (CL), the proportionality assumption in @cox72 (PROP) and if interactions are present (INT). Details on the simulation mechanism and the simulation parameters can be found in the manuscript.

Scenario Alpha

This scenario is a mix of claim_type 0 and claim_type 1 with same number of claims at each accident month (i.e. the claims volume).

# Input data

input_data_0 <- data_generator(
  random_seed = 1964,
  scenario = "alpha",
  time_unit = 1 / 360,
  years = 4,
  period_exposure = 200
)
input_data_0 %>%
  as.data.frame() %>%
  mutate(claim_type = as.factor(claim_type)) %>%
  ggplot(aes(x = RT - AT, color = claim_type)) +
  stat_ecdf(size = 1) +
  labs(title = "Empirical distribution of simulated notification delays", x =
         "Notification delay (in days)", y = "Cumulative Density") +
  xlim(0, 1500) +
  scale_color_manual(
    values = c("royalblue", "#a71429"),
    labels = c("Claim type 0", "Claim type 1")
  ) +
  scale_linetype_manual(values = c(1, 3),
                        labels = c("Claim type 0", "Claim type 1")) +
  guides(
    color = guide_legend(title = "Claim type", override.aes = list(
      color = c("royalblue", "#a71429"), size = 2
    )),
    linetype = guide_legend(
      title = "Claim type",
      override.aes = list(linetype = c(1, 3), size = 0.7)
    )
  ) +
  theme_bw()

Scenario Beta

This scenario is similar to simulation Alpha but the volume of claim_type 1 is decreasing in the most recent accident dates. When the longer tailed bodily injuries have a decreasing claim volume, aggregated chain ladder methods will overestimate reserves, see @ajne94.

input_data_1 <- data_generator(
  random_seed = 1964,
  scenario = 1,
  time_unit = 1 / 360,
  years = 4,
  period_exposure  = 200
)
input_data_1 %>%
  as.data.frame() %>%
  mutate(claim_type = as.factor(claim_type)) %>%
  ggplot(aes(x = RT - AT, color = claim_type)) +
  stat_ecdf(size = 1) +
  labs(title = "Empirical distribution of simulated notification delays", x =
         "Notification delay (in days)", y = "Cumulative Density") +
  xlim(0, 1500) +
  scale_color_manual(
    values = c("royalblue", "#a71429"),
    labels = c("Claim type 0", "Claim type 1")
  ) +
  scale_linetype_manual(values = c(1, 3),
                        labels = c("Claim type 0", "Claim type 1")) +
  guides(
    color = guide_legend(title = "Claim type", override.aes = list(
      color = c("royalblue", "#a71429"), size = 2
    )),
    linetype = guide_legend(
      title = "Claim type",
      override.aes = list(linetype = c(1, 3), size = 0.7)
    )
  ) +
  theme_bw()

Scenario Gamma

An interaction between claim_type 1 and accident period affects the claims occurrence. One could imagine a scenario, where a change in consumer behavior or company policies resulted in different reporting patterns over time. For the last simulated accident month, the two reporting delay distributions will be identical.

# Input data

input_data_2 <- data_generator(
  random_seed = 1964,
  scenario = 2,
  time_unit = 1 / 360,
  years = 4,
  period_exposure = 200
)
input_data_2 %>%
  as.data.frame() %>%
  mutate(claim_type = as.factor(claim_type)) %>%
  ggplot(aes(x = RT - AT, color = claim_type)) +
  stat_ecdf(size = 1) +
  labs(title = "Empirical distribution of simulated notification delays", x =
         "Notification delay (in days)", y = "Cumulative Density") +
  xlim(0, 1500) +
  scale_color_manual(
    values = c("royalblue", "#a71429"),
    labels = c("Claim type 0", "Claim type 1")
  ) +
  scale_linetype_manual(values = c(1, 3),
                        labels = c("Claim type 0", "Claim type 1")) +
  guides(
    color = guide_legend(title = "Claim type", override.aes = list(
      color = c("royalblue", "#a71429"), size = 2
    )),
    linetype = guide_legend(
      title = "Claim type",
      override.aes = list(linetype = c(1, 3), size = 0.7)
    )
  ) +
  theme_bw()

Scenario Delta

A seasonality effect dependent on the accident months for claim_type 0 and claim_type 1 is present. This could occur in a real world setting with increased work load during winter for certain claim types, or a decreased workforce during the summer holidays.

input_data_3 <- data_generator(
  random_seed = 1964,
  scenario = 3,
  time_unit = 1 / 360,
  years = 4,
  period_exposure = 200
)
input_data_3 %>%
  as.data.frame() %>%
  mutate(claim_type = as.factor(claim_type)) %>%
  ggplot(aes(x = RT - AT, color = claim_type)) +
  stat_ecdf(size = 1) +
  labs(title = "Empirical distribution of simulated notification delays", x =
         "Notification delay (in days)", y = "Cumulative Density") +
  xlim(0, 1500) +
  scale_color_manual(
    values = c("royalblue", "#a71429"),
    labels = c("Claim type 0", "Claim type 1")
  ) +
  scale_linetype_manual(values = c(1, 3),
                        labels = c("Claim type 0", "Claim type 1")) +
  guides(
    color = guide_legend(title = "Claim type", override.aes = list(
      color = c("royalblue", "#a71429"), size = 2
    )),
    linetype = guide_legend(
      title = "Claim type",
      override.aes = list(linetype = c(1, 3), size = 0.7)
    )
  ) +
  theme_bw()

Scenario Epsilon

The data generating process violates the proportional likelihood in @cox72. We generate the data assuming that a) there is an effect of the covariates on the baseline and b) the proportionality assumption is not valid.

# Input data

input_data_4 <- data_generator(
  random_seed = 1964,
  scenario = 4,
  time_unit = 1 / 360,
  years = 4,
  period_exposure = 200
)
input_data_4 %>%
  as.data.frame() %>%
  mutate(claim_type = as.factor(claim_type)) %>%
  ggplot(aes(x = RT - AT, color = claim_type)) +
  stat_ecdf(size = 1) +
  labs(title = "Empirical distribution of simulated notification delays", x =
         "Notification delay (in days)", y = "Cumulative Density") +
  xlim(0, 1500) +
  scale_color_manual(
    values = c("royalblue", "#a71429"),
    labels = c("Claim type 0", "Claim type 1")
  ) +
  scale_linetype_manual(values = c(1, 3),
                        labels = c("Claim type 0", "Claim type 1")) +
  guides(
    color = guide_legend(title = "Claim type", override.aes = list(
      color = c("royalblue", "#a71429"), size = 2
    )),
    linetype = guide_legend(
      title = "Claim type",
      override.aes = list(linetype = c(1, 3), size = 0.7)
    )
  ) +
  theme_bw()

Bibliography



Try the ReSurv package in your browser

Any scripts or data that you put into this service are public.

ReSurv documentation built on April 4, 2025, 1:39 a.m.