generate_data: Generate Data of Varying Complexity

View source: R/generate_data.R

generate_dataR Documentation

Generate Data of Varying Complexity

Description

[Experimental]

Generates datasets under 5 scenarios of different levels of complexity (here "complexity" means the level of difficulty of analysis).

Usage

generate_data(
  n_claims_per_period,
  n_periods = 40,
  complexity = c(1:5),
  data_type = c("claims", "payments", "incurred"),
  random_seed = NULL,
  verbose = TRUE
)

Arguments

n_claims_per_period

expected number of claims per period (equals the total expected number of claims divided by n_periods).

n_periods

number of accident periods considered (equals number of claims development periods considered); default 40.

complexity

integer from 1 (simplest) to 5 (most complex); see Details.

data_type

a character vector specifying output data types. By default the function will output all 3 datasets (claims, payments, incurred), but the user may choose to output only a subset.

random_seed

optional seed for random number generation for reproducibility.

verbose

logical; if TRUE print a message about the data generated.

Details

generate_data() produces datasets of varying levels of complexity, where 1 represents the simplest, and 5 represents the most complex:

  • 1 – simple, homogeneous claims experience, with zero inflation.

  • 2 – slightly more complex than 1, with dependence of notification delay and settlement delay on claim size, and 2% p.a. base inflation.

  • 3 – steady increase in claim processing speed over occurrence periods (i.e. steady decline in settlement delays).

  • 4 – inflation shock at time 30 (from 0% to 10% p.a.).

  • 5 – default distributional models, with complex dependence structures (e.g. dependence of settlement delay on claim occurrence period).

We remark that this by no means defines the limits of the complexity that can be generated with SPLICE. This function is provided for the convenience of users who wish to generate (a collection of) datasets under some representative scenarios. If more complex features are required, the user is free to modify the distributional assumptions (which, of course, requires more thoughts and coding) to achieve their purposes.

Value

A named list of dataframes:

claim_dataset A dataset of claim records that takes the same structure as test_claim_dataset, with each row representing a unique claim.
payment_dataset A dataset of partial payment records that takes the same structure as test_transaction_dataset, with each row representing a unique payment.
incurred_dataset A dataset of transaction records that tracks how the case estimates change over time. Takes the same structure as test_incurred_dataset, with each row representing a transaction (any of claim notification, settlement, a payment, or a case estimate revision).

See Also

generate_claim_dataset, generate_transaction_dataset, generate_incurred_dataset

Examples

# Generate datasets of full complexity
result <- generate_data(
  n_claims_per_period = 50, data_type = c('claims', 'payments'),
  complexity = 5, random_seed = 42)

# Save individual datasets
claims <- result$claim_dataset
payments <- result$payment_dataset

# Generate chain-ladder compatible dataset
CL_simple <- generate_data(
  n_claims_per_period = 50, data_type = 'claims', complexity = 1, random_seed = 42)

# To mute message output
CL_simple_2 <- generate_data(
  n_claims_per_period = 50, data_type = 'claims', verbose = FALSE, random_seed = 42)

# Ouput is reproducible with the same random_seed value
all.equal(CL_simple$claim_dataset, CL_simple_2$claim_dataset)


SPLICE documentation built on April 16, 2023, 9:19 a.m.