View source: R/generate_data.R
generate_data | R Documentation |
Generates datasets under 5 scenarios of different levels of complexity (here
"complexity" means the level of difficulty of analysis).
generate_data(
n_claims_per_period,
n_periods = 40,
complexity = c(1:5),
data_type = c("claims", "payments", "incurred"),
random_seed = NULL,
verbose = TRUE
)
n_claims_per_period |
expected number of claims per period (equals
the total expected number of claims divided by |
n_periods |
number of accident periods considered (equals number of claims development periods considered); default 40. |
complexity |
integer from 1 (simplest) to 5 (most complex); see Details. |
data_type |
a character vector specifying output data types. By default the function will output all 3 datasets (claims, payments, incurred), but the user may choose to output only a subset. |
random_seed |
optional seed for random number generation for reproducibility. |
verbose |
logical; if |
generate_data()
produces datasets of varying levels of complexity,
where 1 represents the simplest, and 5 represents the most complex:
1 – simple, homogeneous claims experience, with zero inflation.
2 – slightly more complex than 1, with dependence of notification delay and settlement delay on claim size, and 2% p.a. base inflation.
3 – steady increase in claim processing speed over occurrence periods (i.e. steady decline in settlement delays).
4 – inflation shock at time 30 (from 0% to 10% p.a.).
5 – default distributional models, with complex dependence structures (e.g. dependence of settlement delay on claim occurrence period).
We remark that this by no means defines the limits of the complexity that can
be generated with SPLICE
. This function is provided for the convenience of
users who wish to generate (a collection of) datasets under some
representative scenarios. If more complex features are required, the user is
free to modify the distributional assumptions (which, of course, requires
more thoughts and coding) to achieve their purposes.
A named list of dataframes:
claim_dataset | A dataset of claim records that takes the same structure
as test_claim_dataset , with each row representing a
unique claim. |
payment_dataset | A dataset of partial payment records that takes the
same structure as test_transaction_dataset , with
each row representing a unique payment. |
incurred_dataset | A dataset of transaction records that tracks how the
case estimates change over time. Takes the same structure as
test_incurred_dataset , with each row representing a transaction
(any of claim notification, settlement, a payment, or a case estimate
revision).
|
generate_claim_dataset
,
generate_transaction_dataset
,
generate_incurred_dataset
# Generate datasets of full complexity
result <- generate_data(
n_claims_per_period = 50, data_type = c('claims', 'payments'),
complexity = 5, random_seed = 42)
# Save individual datasets
claims <- result$claim_dataset
payments <- result$payment_dataset
# Generate chain-ladder compatible dataset
CL_simple <- generate_data(
n_claims_per_period = 50, data_type = 'claims', complexity = 1, random_seed = 42)
# To mute message output
CL_simple_2 <- generate_data(
n_claims_per_period = 50, data_type = 'claims', verbose = FALSE, random_seed = 42)
# Ouput is reproducible with the same random_seed value
all.equal(CL_simple$claim_dataset, CL_simple_2$claim_dataset)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.